In today’s artistic landscape, Artificial Intelligence (AI) is changing the way we create and perceive images. With deep fakes becoming ever more realistic, it’s becoming more difficult to distinguish between content generated by humans and that produced by AI.
One notable aspect of AI’s capabilities is its abilities to replicate voices, exemplified by David Guetta using Eminem’s voice during his live performance (David Guetta, 2023), as well as a YouTuber using AI to make a song that sounds like it was made by YOASOBI (ハナ hana, 2023) .
While my previous blog primarily focussed on AI image creation, this blog will be focussing on AI’s text to speech capabilities.
The Emergence of Deep fakes
Contrary to some beliefs, there are positive aspects about deep fakes. Examples of this are Harrison Ford using deep fake to make himself younger, in order to be able play in the latest Indiana Jones movie, Indiana Jones and the Dial of Destiny (Singh, 2023). With the power by AI, Harrison Ford has more flexibility as an actor. Additionally, deep fakes help people to protect their privacy by masking their voices and faces (Applications of Deepfake Technology: Its Benefits and Threats, 2023).
Despite these benefits, some might argue that the costs outweigh the benefits, with deep fake being used to mislead, deceit and manipulate people.
My experiments with AI
Inspired by online videos of people using AI to create new songs with the voice of their favorite artists, I too set out to do the same. After searching the internet far and wide, I realized that I faced two options: either running a demanding AI model (like Retrieval-based Voice Conversion) on my own hardware or finding an online alternative that would run it for me. Unfortunately, I do not possess hardware that is powerful enough to run such AI’s natively and nearly all text to speech conversion is behind a paywall, especially those featuring famous voices.
This meant that I had to settle for some lesser known artists and was limited to the amount of words that I could transcribe. Fortunately, Uberduck did provide a tool which made it easier to help align the voice for a wrap verse.
I used this tool to create a short rap with the input being “studying”.
In the rap it is clearly audible that this is not being sung by a human being and this tool was unfortunately limited to 1 verse.
The future of AI text to speech
In my assessment, AI-powered text-to-speech technology should be approached with caution due to its potential to dramatically alter our lives. Nevertheless, this technology is not available for everyone, as one needs to purchase the hardware required or access paid platforms to run these AI’s. The tools that are currently publicly available do generate text to speech, but still fall short in convincingly replicating human voices.
If advanced text to speech becomes more easily accessible to a broader audience, it could pose a potential threat. The spread of fake news on social media and the internet might rise, creating a major challenge. To prevent falling victim to deceit, it is critical to develop techniques for properly addressing the implications of deep fakes.
References
Applications of Deepfake Technology: Its benefits and Threats. (2023, July 20). https://www.knowledgenile.com/blogs/applications-of-deepfake-technology-positives-and-dangers#:~:text=Deepfake%20can%20also%20be%20used,using%20a%20personal%20digital%20avatar.
David Guetta. (2023, February 8). Eminem but with AI (i’m not releasing it commercially obviously) [Video]. YouTube. https://www.youtube.com/watch?v=98WTwSnkoas
Singh, P. (2023, July 19). Hollywood going the AI way: How the new Indiana Jones movie de-aged actor Harrison Ford. Business Today. https://www.businesstoday.in/technology/news/story/hollywood-going-the-ai-way-how-the-new-indiana-jones-movie-de-aged-actor-harrison-ford-390481-2023-07-19#:~:text=With%20a%20team%20of%20100,and%20the%20Dial%20of%20Destiny.
ハナ hana. (2023, July 28). How to Make YOASOBI song [Video]. YouTube. https://www.youtube.com/watch?v=ZHAVw4exM9U