Hume AI launches custom synthetic voices with Voice Control
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Hume AI, the startup specializing in emotionally intelligent voice interfaces, has launched Voice Control, an experimental feature that empowers developers and users to create custom AI voices through precise modulation of vocal characteristics — no coding, AI prompt engineering, or sound design skills required.
This release builds on the foundation laid by the company’s earlier Empathic Voice Interface 2 (EVI 2), which introduced advanced capabilities in naturalness, emotional responsiveness, and customization.
Both EVI 2 and Voice Control avoid the risks of voice cloning, a practice that Cowen has stated carries ethical and practical challenges.
Instead, Hume focuses on providing tools for creating unique, expressive voices that align with user needs, such as customer service chatbots, digital assistants, tutors, guides, or accessibility features.
Moving beyond preset AI voices toward custom bespoke solutions
Voice Control offers developers the ability to adjust voices along 10 distinct dimensions, including:
“Masculine/Feminine: The vocalization of gender, ranging between more masculine and more feminine.
Assertiveness: The firmness of the voice, ranging between timid and bold.
Buoyancy: The density of the voice, ranging between deflated and buoyant.
Confidence: The assuredness of the voice, ranging between shy and confident.
Enthusiasm: The excitement within the voice, ranging between calm and enthusiastic.
Nasality: The openness of the voice, ranging between clear and nasal.
Relaxedness: The stress within the voice, ranging between tense and relaxed.
Smoothness: The texture of the voice, ranging between smooth and staccato.
Tepidity: The liveliness behind the voice, ranging between tepid and vigorous.
Tightness: The containment of the voice, ranging between tight and breathy.”
This no-code tool allows users to fine-tune voice attributes in real time through virtual onscreen sliders. It’s currently available in Hume’s virtual playground, which requires a free user sign-up to access.
The release addresses a key pain point in the AI industry: the reliance on preset voices, which often fail to meet the specific needs of brands or applications, or the risks associated with voice cloning.
This focus on customization aligns with Hume’s broader goal of developing emotionally nuanced voice AI.
The company’s efforts to advance voice AI were highlighted in September 2024 with the launch of EVI 2, which the company described as a significant upgrade to its predecessor.
EVI 2 improved latency by 40%, reduced costs by 30%, and expanded voice modulation features, offering developers a safer alternative to voice cloning.
Sliders > text prompts
Hume’s research-driven approach plays a central role in its product development. The company, co-founded by former Google DeepMinder Alan Cowen, utilizes a proprietary model based on cross-cultural voice recordings paired with emotional survey data.
This methodology, rooted in emotion science, forms the backbone of both EVI 2 and the newly launched Voice Control.
Voice Control extends these principles by addressing the granular, often ineffable ways humans perceive voices.
The tool’s slider-based interface reflects common perceptual qualities of voice, such as buoyancy or assertiveness, without attempting to oversimplify these attributes through text-based prompts.
Voice Control is immediately available in beta and integrates with Hume’s Empathic Voice Interface (EVI), making it accessible for a wide range of applications.
Developers can select a base voice, adjust its characteristics, and preview the results in real time. This process ensures reproducibility and stability across sessions, key features for real-time applications like customer service bots or virtual assistants.
EVI 2’s influence is evident in Voice Control’s capabilities. The earlier model introduced features like in-conversation prompts and multilingual capabilities, which have broadened the scope of voice AI applications.
For example, EVI 2 supports sub-second response times, enabling natural and immediate conversations. It also allows dynamic adjustments to speaking style during interactions, making it a versatile tool for businesses.
Differentiating in a competitive market
Hume’s focus on voice customization and emotional intelligence positions it as a strong competitor in the voice AI space, even against well-funded rivals such as OpenAI with its Advanced Voice Mode and ElevenLabs, both of which offer libraries of pre-set voices.
Hume continues to build on its innovative approach to voice AI. Plans for expanding Voice Control include introducing additional modifiable dimensions, refining voice quality under extreme adjustments, and increasing the range of base voices available.
With the launch of Voice Control, Hume reinforces its position as a leader in voice AI innovation, offering tools that prioritize customization, emotional intelligence, and real-time adaptability. Developers can access Voice Control today via Hume’s platform, marking another step forward in the evolution of AI-driven voice solutions.