5/5 (1)

In today’s artistic landscape, Artificial Intelligence (AI) is changing the way we create and perceive images. With deep fakes becoming ever more realistic, it’s becoming more difficult to distinguish between content generated by humans and that produced by AI.

One notable aspect of AI’s capabilities is its abilities to replicate voices, exemplified by David Guetta using Eminem’s voice during his live performance (David Guetta, 2023), as well as a YouTuber using AI to make a song that sounds like it was made by YOASOBI (ハナ hana, 2023) .

While my previous blog primarily focussed on AI image creation, this blog will be focussing on AI’s text to speech capabilities.

The Emergence of Deep fakes

Contrary to some beliefs, there are positive aspects about deep fakes. Examples of this are Harrison Ford using deep fake to make himself younger, in order to be able play in the latest Indiana Jones movie, Indiana Jones and the Dial of Destiny (Singh, 2023). With the power by AI, Harrison Ford has more flexibility as an actor. Additionally, deep fakes help people to protect their privacy by masking their voices and faces (Applications of Deepfake Technology: Its Benefits and Threats, 2023).

Despite these benefits, some might argue that the costs outweigh the benefits, with deep fake being used to mislead, deceit and manipulate people.

My experiments with AI

Inspired by online videos of people using AI to create new songs with the voice of their favorite artists, I too set out to do the same. After searching the internet far and wide, I realized that I faced two options: either running a demanding AI model (like Retrieval-based Voice Conversion) on my own hardware or finding an online alternative that would run it for me. Unfortunately, I do not possess hardware that is powerful enough to run such AI’s natively and nearly all text to speech conversion is behind a paywall, especially those featuring famous voices.

This meant that I had to settle for some lesser known artists and was limited to the amount of words that I could transcribe. Fortunately, Uberduck did provide a tool which made it easier to help align the voice for a wrap verse.

I used this tool to create a short rap with the input being “studying”.

In the rap it is clearly audible that this is not being sung by a human being and this tool was unfortunately limited to 1 verse.

The future of AI text to speech

In my assessment, AI-powered text-to-speech technology should be approached with caution due to its potential to dramatically alter our lives. Nevertheless, this technology is not available for everyone, as one needs to purchase the hardware required or access paid platforms to run these AI’s. The tools that are currently publicly available do generate text to speech, but still fall short in convincingly replicating human voices.

If advanced text to speech becomes more easily accessible to a broader audience, it could pose a potential threat. The spread of fake news on social media and the internet might rise, creating a major challenge. To prevent falling victim to deceit, it is critical to develop techniques for properly addressing the implications of deep fakes.

References

Applications of Deepfake Technology: Its benefits and Threats. (2023, July 20). https://www.knowledgenile.com/blogs/applications-of-deepfake-technology-positives-and-dangers#:~:text=Deepfake%20can%20also%20be%20used,using%20a%20personal%20digital%20avatar.

David Guetta. (2023, February 8). Eminem but with AI (i’m not releasing it commercially obviously) [Video]. YouTube. https://www.youtube.com/watch?v=98WTwSnkoas

Singh, P. (2023, July 19). Hollywood going the AI way: How the new Indiana Jones movie de-aged actor Harrison Ford. Business Today. https://www.businesstoday.in/technology/news/story/hollywood-going-the-ai-way-how-the-new-indiana-jones-movie-de-aged-actor-harrison-ford-390481-2023-07-19#:~:text=With%20a%20team%20of%20100,and%20the%20Dial%20of%20Destiny.

ハナ hana. (2023, July 28). How to Make YOASOBI song [Video]. YouTube. https://www.youtube.com/watch?v=ZHAVw4exM9U

2 thoughts on “The Impact of text to speech AI”

Hi Alexander,

Thank you for your informative post! I have heard of similar examples and applications of deepfakes both in video and audio but have not had the chance to try it out myself. Its interesting to see how you have struggled to find some easily accessible tools online but I guess it is just a matter of time until that will no longer be the case. I think it is important to keep the ethical aspects of video/audio deepfakes both in terms of copyright of the owner of voice and face, but also in terms of the current ethical guidelines that are discussed. You say that one can feel how the voice in the video sounds artificial but i would argue that under certain circumstances it does sound natural. As a result, the big aspect of AI-transparency could be broken to achieve another ethical barrier, maleficent use. I have read posts and news articles online how bad actors have already used such voice imitation tools to run scams on people by gaining access to private information. This of course moves away from the musical part of your post but instantly came to mind when reading this post. To come back to it, i also wanted to point out the example of a popular story online right now, where an individual used the generative AI “Ghostwriter” to make a song with the voices of artists Drake and the Weeknd. While looking the information up just now i saw that the song was actually entered as a submission to the Grammys which is an interesting development in the gen-AI field.

Thank you for your post! I found it very interesting and hilarious that AI can create a rap out of an input that is rarely associated with rap, and even manages to rhyme. I think it makes total sense that the deep fake tools are not accessible to everyone as if all of us could easily create deep fake content that could change the landscape of media and cause total chaos. It is necessary for guidelines and regulations to be created asap to prevent the destructive usage of generative AI.
Deep fakes can pose a threat not only to the general public but also to actors and singers as well. I have heard that movie extras are afraid that their jobs might be replaced by AI. As our generation must be involved with AI, I feel the urgent need to learn how to use AI to enhance our lives rather than being exploited by it. Anyway, great job!

The Impact of text to speech AI

11

The Emergence of Deep fakes

My experiments with AI

The future of AI text to speech

References

Please rate this

Related

2 thoughts on “The Impact of text to speech AI”

Leave a Reply Cancel reply