The AI text-to-speech generator Eleven Labs (https://llelevenlabs.com/), offers a total of 28 languages, including German and English. I used the default voice settings, Eleven Multilingual v1, and I tried four different voices (see picture below) in the audiobook category because my input text is retrieved from a book that is published in both English and German (Yogananda, 2005; Yogananda, 2006).
The first voice “Mathilda” is described as warm and American for the English speech. In my opinion, the generated speeches in both languages had the same good style, flow, and warmth. However, being extremely critical, they were not quite as playful and lively as I am used to from traditional audiobooks.
The next voice “Grace” is supposed to be calm and southern-American for the English speech. In German the pauses and emphases appear slightly unauthentic which makes the speech sound a little bit robotic. In English in contrast, the speech is calm as promised because pauses and emphases appear at the right time.
Next, I tried two men voices. “Matthew” is labelled warm and British for the English speech. For the two languages, “he” was my favourite. Calm, taking pauses, yet lively. It seems like a real recorded audiobook. I didn’t recognise a quality difference between German and English.
Lastly, “Michael” is orotund and American for the English speech. To me, the speech did sound orotund in both languages, but the German voice sounded slightly robotic again.
The two images below show the interface on Eleven Labs with my text (picture 1 in English and picture 2 in German) and my settings. After generating, the audio is available on the bottom.
To conclude, I am overall fascinated by how authentic the audio generated by the AI text-to-speech generator Eleven Labs sounds. I experienced two out of four voices more authentic in English, so there seems to be more room for improvement in the German speeches to make them sound less robotic. As the AI text-to-speech generators will probably only improve further in the future, it seems that AI is a disruptive technology for the audiobook recording industry. And the technology may take root in similar industries: Will we soon have the news on the radio read out by an AI?
References
Yogananda, P. (2005). Autobiography of a Yogi: The Original 1946 Edition plus Bonus Material (p.13). Crystal Clarity Publishers.
Yogānanda. (2006). Autobiographie: Übersetzung der Originalausgabe von “Autobiography of a Yogi” aus dem Jahre 1946 (p.13). Hans-Nietsch-Verlag.
Your post nicely sums up my experience with these types of tools. While trying them out, I surprisingly found the voices real and pleasant as well. It would be interesting to test the other categories available on ElevenLabs like Videos or Games, to see how the tool caters to different outlets. I am also wondering what would be the future for text-to-speech generators, if for example multiple voices will be allowed to mimic dialogue.
I also wonder what will be the effects of the new feature ElevenLabs just launched which includes AI dubbing and voice translation. This new feature allows for automated voice translation and replaces the original videos while maintaining the key characteristics of the original voice. This could for example easily allow content creators to make their videos available to a much wider audience. At the same time, it would also help deliver news straight from the source. However, it would be necessary to test it out and see what is the accuracy of the translations.
I really enjoyed reading your post, thanks! It is impressive that ElevenLabs offers a text-to-speech feature in 28 languages. Great that it’s also possible to choose different voices and it is made based on your preferences.
Interestingly, the generated speeches in both languages had a good style, flow, and warmth. But while AI has come a long way, it is still obvious that there is work to be done in terms of mimicking the nuances and expressiveness of human speech.
Opportunities for further improvements in the language area especially will be directed at the less common languages. But I’m quite confident that with the development of AI, these problems will be solved, and the quality of generated speech will increase.
The idea of using AI to read news on the radio is also intriguing. Yet ethical aspects and trust issues should be considered when using AI to solve such important tasks. Ensuring the accuracy and objectivity of news reporting is a primary task in journalism, therefore, the integration of AI technologies into this area should be carried out with great care and control.
In my opinion, the potential for AI to change the audiobook industry is definitely possible and probably will happen in the next years, but to ensure the best user experience, it should be approached thoughtfully and in compliance with ethical standards.