While enjoying the amazing weather last week in Rotterdam, I listened to a podcast where journalists discussed the launch of Eleven Labs v2. Eleven Labs is a voice AI research & deployment company with a mission to make content universally accessible in any language & voice. With their new update, the AI generated voice can now ‘speak’ 28 languages. My interest was piqued when they mentioned the AI can generate text-to-speech in any voice with only 1 minute of recorded input.
So I purchased a starter account and of course the first thing I did was record my own voice to see how accurate it was. I could definitely hear aspects of my own voice, but the tempo and tone felt off to me. As if it was half me half robot. But, with only 1 minute of audio provided, I was still very impressed.
Now of course, one of the first things that comes up when discussing this technology, is the threat of replicating someone’s voice for criminal purposes. In 2021 citizens in the United States lost 770 million because of social media scams. Imagine how much more will be lost if these scammers are actually able to artificially replicate the voice of the person they are trying to imitate.
Putting that terrifying thought aside for a moment, there are also lots of opportunities in which this technology can actually be used in a positive manner. For example, narration is quite common in the media and entertainment industry. Animated movies, games, commercials & audiobooks heavily rely on the quality of voice-overs. Generative AI-voice can replace the actors needed for this task, cutting a huge amount of costs in salary, audio equipment and audio editing. Besides being cheaper, AI-voicing is also highly customizable. With voice-over having an impact on the efficiency of a commercial, companies can benefit from AB testing and find the perfect one.
These exciting possibilities offer much value to many more industries and even individuals like yourselves. Imagine customizing Alexa or Siri to a less annoying voice! Or having Morgan Freeman or David Attenborough narrate the next article on digital disruption, making it more captivating! (I’ll provide the Attenborough option for this blog post below).
Of course I too worry about criminals being able to replicate our voices, but the technology is already here. The best we can do is educate our elderly relatives how to deal with this danger, but embrace its potential in our businesses.
Sources:
Social media a gold mine for scammers in 2021. (2022, 18 augustus). Federal Trade Commission. https://www.ftc.gov/news-events/data-visualizations/data-spotlight/2022/01/social-media-gold-mine-scammers-2021
NRC. (2023, 6 september). Waarom door AI zelfs je eigen stem niet meer te vertrouwen is. NRC. https://www.nrc.nl/nieuws/2023/09/06/waarom-door-ai-zelfs-je-eigen-stem-niet-meer-te-vertrouwen-is-a4173621
Pennock-Speck, B., & del Saz Rubio, M. M. (2009). Voice-overs in Standardized English and Spanish Television Commercials / La voz en off en anuncios televisivos estandarizados en ingles y español. Atlantis, 31(1), 111–127. https://www.jstor.org/stable/41055349
WWF International. (2019, 24 september). Sir David Attenborough | A message to world leaders [Video]. YouTube. https://www.youtube.com/watch?v=fyYpExl8AJU
CNN. (2014, 3 juni). “Courage is the key to life itself” [Video]. YouTube. https://www.youtube.com/watch?v=r72a19Lbz7k
Such an interesting topic, very relevant and great that you actually tried it yourself! I like that you added the Attenborough voice-over, super creative. It reminds me of the “covers” on TikTok where famous people ranging from Mark Rutte to Eric Cartman (from South Park) “sing” songs as Bohemian Rhapsody. Crazy that it only needs one minute of input and still sound very real to me. However, indeed, it does also have a downside considering scamming that should be considered.
Interesting article and point about generative AI-voice in the entertainment industry, especially in light of the recent strikes in Hollywood. Most (voice) actors and writers are paid residuals (long-term payments) based on show runs, reruns and bought DVDs or tapes. Since most films and series can be streamed on platforms like Netflix, their residuals have greatly diminished. How would this work for AI-generated voice-overs? An actor still has to provide their voice for the AI to work. How would they be paid if there is no way to control on how many shows and films their voice might be used? I believe that the rules and regulations surrounding generative AI-voice will decide how much of it will actually be used in the entertainment industry.
Thank you for sharing your experience of experimenting with this tool! It is certainly impressive how far generative AI has come in terms of its capabilities and how easily accessible it is nowadays to everyone. I appreciate that you also pointed out some potential risks and downsides of the widespread accessibility of such tools. It indeed is scary to think that someone could so easily imitate your own voice and potentially use it to conduct fraud. This does, of course, not only apply to text-to-voice generative AI tools, but also other AI models that can create realistic images or videos of actual people or situations that may be used as “deepfakes”. In the future, it may be more and more difficult to actually differentiate the truth from fake.
By Jesse Osinga, student number 642011
What an informative and cool piece! It’s clear that advancements in AI voice technology can have a huge impact on different industries. The fact that AI can already generate text-to-speech in 28 languages is impressive on its own. With the introduction of this technology, an important societal question arises about the possibilities for misuse. As you mentioned, there can be a lot of misuse when it comes to imitating someone’s voice, especially as the technology continues to advance. In my opinion, these are crucial societal questions that need to be addressed. It’s impressive to hear that you see this technology as an opportunity to save costs and improve processes across various industries. And the idea of customizing voice assistants like Alexa and Siri with less annoying voices is a fun one! Thanks for sharing this blog!