Voice Mode has quickly become one of the most beloved features in ChatGPT, allowing users to interact with the AI in a more natural and engaging way. But how did OpenAI select the perfect voices for this feature? The journey to find the voices for ChatGPT was an extensive and meticulous process, involving industry-leading professionals, hundreds of submissions, and months of careful consideration. In this blog post, we'll take you behind the scenes to explore how the voices for ChatGPT were chosen and the steps taken to ensure they met the high standards set by OpenAI.

The Vision for Voice Mode

In September 2023, OpenAI introduced voice capabilities to ChatGPT, aiming to provide users with a more interactive and human-like experience. The goal was to create voices that were not only pleasant to listen to but also capable of conveying a range of emotions and tones. This required voices that were warm, engaging, and confidence-inspiring, with a rich tone that could appeal to a global audience.

To achieve this, OpenAI partnered with award-winning casting directors and producers to establish a set of criteria for the voices. Some of the key characteristics they looked for included:

  • Diverse Backgrounds: Actors from various backgrounds or those who could speak multiple languages.
  • Timelessness: A voice that feels timeless and can remain relevant for years to come.
  • Approachability: An approachable voice that inspires trust and comfort.
  • Engagement: A warm, engaging, and charismatic voice with a rich tone.
  • Naturalness: A natural and easy-to-listen-to voice that doesn't feel artificial.

The Casting Process

The casting process began in early 2023, with OpenAI working closely with independent, well-known casting directors and producers. On May 10, 2023, the casting agency issued a call for talent, and within a week, they received over 400 submissions from voice and screen actors. Each actor was given a script of ChatGPT responses to record, covering a range of topics from mindfulness to travel planning and daily conversations.

The casting team independently reviewed and hand-selected an initial list of 14 actors from the submissions. These actors were then further refined before presenting the top voices to OpenAI. The final selection process involved discussions with each actor about the vision for human-AI voice interactions, the technology's capabilities and limitations, and the safeguards implemented to ensure ethical use.

Selecting the Final Voices

After careful consideration, the voices for Breeze, Cove, Ember, Juniper, and Sky were selected. Each actor flew to San Francisco for recording sessions and in-person meetings with the OpenAI product and research teams. These sessions took place over June and July 2023, and the voices were officially launched into ChatGPT on September 25, 2023.

The actors behind these voices were compensated above top-of-market rates, reflecting OpenAI's commitment to supporting the creative community. This compensation will continue for as long as their voices are used in ChatGPT products.

Addressing Concerns and Ensuring Ethical Use

One of the key principles guiding the selection of voices was the ethical use of AI. OpenAI believes that AI voices should not deliberately mimic a celebrity's distinctive voice. This principle came into focus with the voice of Sky. On May 20, 2024, OpenAI's CEO, Sam Altman, issued a statement clarifying that Sky's voice was not an imitation of Scarlett Johansson's voice but belonged to a different professional actress using her natural speaking voice. Out of respect for Ms. Johansson, OpenAI paused the use of Sky's voice in their products.

Looking Ahead: New Voices and Enhanced Capabilities

The journey to find the perfect voices for ChatGPT doesn't end here. OpenAI is continuously working to enhance Voice Mode and introduce new voices to better match the diverse interests and preferences of users. On May 13, 2024, OpenAI introduced GPT-4o, which includes a new Voice Mode for ChatGPT Plus users. This new mode offers more natural interactions, handles interruptions smoothly, manages group conversations effectively, filters out background noise, and adapts to tone.

OpenAI is also in ongoing conversations with Ms. Johansson's team to address her concerns and explore the possibility of her joining as a future voice actor for ChatGPT. This commitment to collaboration and ethical use underscores OpenAI's dedication to creating a positive and respectful environment for both users and voice actors.

Conclusion

The journey to find ChatGPT's perfect voices was a complex and thoughtful process, involving extensive collaboration with industry professionals, careful consideration of ethical principles, and a commitment to supporting the creative community. The result is a set of voices that enhance the user experience, making interactions with ChatGPT more engaging and human-like.

As OpenAI continues to innovate and introduce new features, users can look forward to even more options and enhanced capabilities in Voice Mode. Whether you're using ChatGPT for personal or professional purposes, the carefully selected voices of Breeze, Cove, Ember, Juniper, and Sky are designed to provide a seamless and enjoyable experience.

Stay tuned for more updates as OpenAI continues to push the boundaries of what's possible with AI voice technology.