Personal Trainer with AI Coach in Presentation Skills

29

September

2023

5/5 (1)

Over the past few years, technological advancements in speech processing have drastically transformed how humans engage with digital devices (Yu, 2016). These developments have paved the way for rapid progress in voice recognition technology, which, in turn, has opened doors for integrating AI into speech training. Particularly, AI-driven speech training has proven effective in enhancing students’ presentation abilities (Junaidi, 2020). Additionally, several studies have shown that due to the fear of speaking in public, individuals often experience such significant anxiety when delivering oral presentations that it can potentially impact their mental health and overall well-being (Grieve, 2021).

During the previous years of the author’s internship at an Amsterdam start-up soft skills training organisation, Lepaya, an opportunity was offered to try out the so-called “AI Coach.” AI Coach is a virtual platform on the mobile app for learners to acquire communication skills effectively with “Machine Based eLearning (MABEL)” (Hoelzer, 2022).

This AI-driven method of Learning and Development (L&D) employs Machine Learning Algorithms on various data types like videos, audio, and text to aid users in improving their conversational abilities in practical scenarios (AI Skills of the Future: Understand AI and Make It Work for You, n.d.). Their process involves collecting practice videos, analysing them using AL systems to extract key speech and conversation indicators such as gestures, facial expressions, and voice, and then providing feedback to users for improvement (Hoelzer, 2022).

The pipeline comprises several steps, beginning with videos being collected internally or through the app developed in Flutter (Hoelzer, 2022). Next, videos are processed using MABEL API, which analyses video, sounds, and text using Python and Docker within Sagemaker on AWS and machine learning libraries like TensorFlow, PyTorch, and scikit-learn (Hoelzer, 2022).

Afterwards, data is collected to transform into datasets, with Luigi used to track transformations and ensure reproducibility. Then, annotated datasets are crucial for training machine learning models using LabelStudio, covering aspects like filler words, gestures, facial expressions, and overall presentation ratings. Next, machine learning models are developed based on the annotated datasets, including audio models (e.g., filler word detection), video models (e.g., human keypoint detection. Emotion classification), and regular models (to provide a presentation rating). Tools like Melflow are used to manage experiments. Lastly, after quality assurance checks, the updated MABEL pipeline with the new models is delayed (Hoelzer, 2022).

In conclusion, this comprehensive approach, which combines generative AI and effective training methods, represents a major leap forward in communication skills development.

References:

AI Skills of the Future: Understand AI and Make it Work for You. (n.d.). https://www.lepaya.com/blog/ai-skills-of-the-future

Grieve, R., Woodley, J., Hunt, S. E., & McKay, A. (2021). Student fears of oral presentations and public speaking in higher education: a qualitative survey. Journal of Further and Higher Education45(9), 1281-1293.

Hoelzer, T. (2022, November 7). MABEL — How we build AI at Lepaya Tech – Lepaya Tech – Medium. Medium. https://medium.com/lepaya-tech/mabel-how-we-build-ai-at-lepaya-tech-2ed6c806a23c

Junaidi, J. (2020). Artificial intelligence in EFL context: rising students’ speaking performance with Lyra virtual assistance. International Journal of Advanced Science and Technology Rehabilitation29(5), 6735-6741.

Please rate this

Leave a Reply

Your email address will not be published. Required fields are marked *