Synthesia – in my opinion a game changer when it comes to video generation. As someone who loves to use videos as an informational source to quickly find the information needed to complete a task I was looking into GenAI tools that enable users to generate video content within a few steps. Additionally, I have noticed a growing importance of video materials in business communication. Therefore, I conducted a quick web search which ended with me picking Synthesia, https://www.synthesia.io/de, as a tool I wanted to explore further.
In simple words Synthesia is an all-in-one GenAI platform that turns text into realistic videos with avatars. I started with generating a simple two minute video from a text I generated with ChatGPT. Two things surprised me with that process. The interface was very intuitive, and the videos looked super realistic. Synthesia does a great job with a range of avatars that look and sound like a real human being. That made me curious, and I found that there are above 50 different avatars one can use within their videos. A new feature even lets you create a digital version of yourself, so you can create an avatar that looks and sounds just like you. Back to my first use case, I then tried to customize the video to exactly fit my desired needs, and Synthesia lets you edit every single word that is being said in the video as well as the bullet points used on the slides that are shown in the video. That way Synthesia makes video generation easy since it lets you generate an MVP within a few minutes and then lets you easily adapt the generated video to fit your desired needs.
I believe Synthesia can have a real impact in video generation as it is a massive time saver, and the results are of high quality as they produce nearly perfectly realistic videos. These advances in AI video generation can also impact a vast majority of businesses in areas like marketing, training and development, onboarding or customer support.
For the people interested in underlying technology: Video generation is based on GenAI models and Deep Learning, where these models have been trained on huge amounts of audio and video data. To transform text-to-speech Synthesia uses advanced AI models and neural voices that sound increasingly realistic. Avatar creation becomes very realistic as they are created by recording real actors and then using motion capture and Generative Adversial Networks (GANs) to generate realistic animations (Zhang et al, 2021). I think it is safe to say that these technologies are only getting better over time and that implies a promising future for AI video generation.
As a critical reflection I would like to mention the risk of misuse and ethical limitations. It raises the question if people can still trust what they see, and it increases the risk of deepfakes and misinformation. People must be aware of these technological advancements to sharpen their critical thinking.
All in all, I have to say that I was surprised in a positive way by how good and realistic AI generated video content has become. I can only encourage every one of you to experiment with tools like Synthesia. At the same time, it is important to keep the current technological advancements in mind when consuming video content on the web.
References:
Zhang, Z., Zhu, Z., Zheng, T., & Zhao, H. (2021). FACIAL: Synthesizing dynamic talking face with implicit attribute learning. arXiv. https://doi.org/10.48550/arXiv.2108.07938
An interesting and thorough write-up, it is fascinating how these new technologies, such as Synthesia, has really changed its industry, making the generally long and oftentimes expensive task of making the right video a much more cheaper and straightforward task, it is interesting to see how realistic these videos get, seeing they are now already often indistinguishable. However, as you mentioned in your write-up there are concerns as well, which I share, especially concerning the blurring of the line between the authentic and the fabricated.