The AI text-to-speech generator Eleven Labs (https://llelevenlabs.com/), offers a total of 28 languages, including German and English. I used the default voice settings, Eleven Multilingual v1, and I tried four different voices (see picture below) in the audiobook category because my input text is retrieved from a book that is published in both English and German (Yogananda, 2005; Yogananda, 2006).
The first voice “Mathilda” is described as warm and American for the English speech. In my opinion, the generated speeches in both languages had the same good style, flow, and warmth. However, being extremely critical, they were not quite as playful and lively as I am used to from traditional audiobooks.
The next voice “Grace” is supposed to be calm and southern-American for the English speech. In German the pauses and emphases appear slightly unauthentic which makes the speech sound a little bit robotic. In English in contrast, the speech is calm as promised because pauses and emphases appear at the right time.
Next, I tried two men voices. “Matthew” is labelled warm and British for the English speech. For the two languages, “he” was my favourite. Calm, taking pauses, yet lively. It seems like a real recorded audiobook. I didn’t recognise a quality difference between German and English.
Lastly, “Michael” is orotund and American for the English speech. To me, the speech did sound orotund in both languages, but the German voice sounded slightly robotic again.
The two images below show the interface on Eleven Labs with my text (picture 1 in English and picture 2 in German) and my settings. After generating, the audio is available on the bottom.
To conclude, I am overall fascinated by how authentic the audio generated by the AI text-to-speech generator Eleven Labs sounds. I experienced two out of four voices more authentic in English, so there seems to be more room for improvement in the German speeches to make them sound less robotic. As the AI text-to-speech generators will probably only improve further in the future, it seems that AI is a disruptive technology for the audiobook recording industry. And the technology may take root in similar industries: Will we soon have the news on the radio read out by an AI?
References
Yogananda, P. (2005). Autobiography of a Yogi: The Original 1946 Edition plus Bonus Material (p.13). Crystal Clarity Publishers.
Yogānanda. (2006). Autobiographie: Übersetzung der Originalausgabe von “Autobiography of a Yogi” aus dem Jahre 1946 (p.13). Hans-Nietsch-Verlag.