AI generated voices: creepy and dangerous or impressive and practical? (using PlayHT)

30

September

2023

5/5 (1)

When asking in my surroundings what the first thing that comes to their mind when thinking about generative AI, I get mostly the answer of chatbots (e.g. ChatGPT) and occasionally generating images (e.g. DALL-E). These types of generative AI have very practical implications, for both individuals as businesses. For instance, generative AI could enhance creativity by using platforms like ChatGPT and Stable Diffusion (Eapen et al., 2023). However, generative AI goes much further than those examples: one of them is generative AI voice. I’ve used the PlayHT platform, which is free, to test my own AI created voice.

I came across PlayHT through a YouTube video about generated AI voice, the results were quite good actually. So I had to test my own AI generated voice. I created a free account on the PlayHT platform after which I could upload a video file of at least 30 seconds of my own voice. I simply read a random Wikipedia page and recorded my voice on my telephone. After uploading, I only had to wait for 30 seconds and my voice was ‘cloned’ (the term PlayHT uses for my own AI generated voice). After that, I just entered a couple sentences and the results were indeed quite good! Although you can definitely hear that it’s not my real voice, there are some similarities. The fact that I only uploaded a voice recording of roughly 30 seconds and gave me those results, were very impressive but also a little bit creepy. For you as a viewer, copy-pasted the whole text in PlayHT with my own voice:

This made me think, does AI generated voice have real practical applications or can it be dangerous and do we have to be careful with this type of technology? Some great useful applications are for voice-overs. Narration plays a significant role in the media and entertainment sector where the voice in one of the most important elements, for example in the case of advertisements. Generative AI voice could replace voice actors and could also increase the amount of voice-overs.

On the other hand, AI generated voices could lead to negative effects when used malicious. A while ago, deepfake was all over the news. With deepfake, sometimes we can’t tell the difference between original video’s and deepfakes. Deepfake sometimes use real voice actors, but with the increase and improvements of generative AI, deepfake could be used even more malicious. Fortunately, there’s a quite extensive research on how to detect deepfakes. Fortunately, according to Rana et al. (2022), deep learning techniques are effective in detecting deepfake. Although it is questionable whether these kinds of detection systems can keep up with current generative AI developments.

I think that AI generated voice is still quite unknown, so there are probably a lot more practical implications which are not used today.

What’s your point of view of AI generated voices? Let me know!

Eapen, T. T., Finkenstadt, D. J., Folk, J., & Venkataswamy, L. (2023). How Generative AI Can Augment Human Creativity. (cover story). *Harvard Business Review*, *101*(4), 56–64.

Rana, M. S., Nobi, M. N., Murali, B., & Sung, A. H. (2022). Deepfake detection: a systematic literature review. *Ieee Access*, *10*. https://doi.org/10.1109/ACCESS.2022.3154404

Please rate this

Quickly generate the perfect playlist for each scenario with AI playlist creation

30

September

2023

5/5 (1)

If you are like me, you spent a lot of time creating playlists for different occasions. Studying, going to the gym, throwing a house party, they all require different playlists with different moods. Playlistable offers a quicker way to do this using generative AI. You can give a short description and a genre, and Playlistable will generate a playlist for you. You can even connect it to your Spotify account, and then it can gain insight into your listening history, and you can instantly add the playlist to your account. I tried to generate two different types of playlists.

First, I used the prompt “studying for a long time for an important test”, with “electronic” as the genre. I think the ai got a little bit confused, because the start of the playlist was comprised of songs I added recently, but in my opinion do not really fit my description. But after 20 or so songs the playlist is comprised of songs the match the description I gave very well. I did not know a lot of the songs, but after listening to it for a while, I actually like that part of the playlist a lot.

My second prompt was “revisiting old classics” with “Rock”. This prompt worked out better than the first one. It looks like it consists entirely of classic Rock songs. Some of these I have little to a lot, others I have never heard before, so I think it mixed my listening history with other similar songs. I will share the link to this playlist here: https://open.spotify.com/playlist/3UH1B7kE21aElDQkDsVVbV?si=750c06a4b5014748

I think this is a very useful tool. You can quickly generate playlists with just a short description and start using them immediately. It is a shame you only get a few trial playlists before you must start paying.

Please rate this

ChatGPT taught me how to make Molotov cocktails! – A lesson of it’s not what you say, it’s HOW you say it.

30

September

2023

5/5 (1)

Disclaimer: I’ll start off by saying that I don’t plan to make a Molotov cocktail. My interest in how to frame prompts, however, is real. My curiosity was first sparked by this post below. 

Interaction 1

Interaction 2

Here is a malicious example of prompting. But, how can we use prompts to our advantage? What can be done to enhance ChatGPT’s performance so that we get the best output?

There are a few reusable solutions to the typical LLM problem, which refer to prompting patterns (White et al., 2023).

  • Meta Language Creation. In this technique, users make up new words to express concepts or ideas. Consider a mathematical symbol or a shorthand abbreviation. This approach works best for discussing complex or abstract situations, such as math problems.
  • Flipped Interaction. This pattern flips the typical interaction flow in which the LLM queries the user to gather data in order to produce content to address the query. Here’s how I can ask LLM to compile a list of success criteria for software.

Persona: Users give the LLM a particular role, which affects the nuance of the outcome and results it produces.  The Molotov cocktail-making example is an illustration of the use of persona patterns

Question Refinement: The user requests LLM to provide improved or more specific versions of the questions. It helps users determine the appropriate question as the final prompt. 

More patterns can be found in the article from White et al. (2023).

When interacting with LLM, prompt patterns are useful methods to enhance response quality. It helps in producing highly accurate and relevant responses. Prompting it is an iterative process that necessitates constant improvement (Liu et al., 2022). Prompts might manipulate LLM to produce malicious output despite the enforced policies. Efforts from OpenAI have been employed to prevent such policy violations. OpenAI reported such efforts including continuous model improvement to make it less likely to generate inappropriate or harmful content, the implementation of moderation mechanisms to find and stop prompt misuse and collaboration with AI experts in ethics, AI safety, and policy to gain perspectives on preventing misuse (Our Approach to AI Safety, n.d.). I positively believe that in the near future getting tutorials on making Molotov cocktails from ChatGPT will be history.

References

Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2022). Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Computing Surveys, 55(9). https://doi.org/10.1145/3560815

Our approach to AI safety. (n.d.). Openai.com. https://openai.com/blog/our-approach-to-ai-safety#OpenAI

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. C. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. ArXiv Preprint ArXiv:2302.11382. https://doi.org/10.48550/arxiv.2302.11382

Please rate this

Does AI make better presentation slides than people?

30

September

2023

5/5 (2)

A couple of weeks ago I saw a reel on Instagram in which the creator showed an AI tool which could build PowerPoint slides quickly with relevant information added. Back then I wasn’t making slides very often but I really found this tool interesting and wanted to try it out. So I took the opportunity to try out Plus AI’s tool for making slides in Google’s online Slides platform. Plus AI offers an extension software that can be added to the browser and accessed straight through the Google Slides platform. The tool can be used three times for free, whereas Plus AI also offers subscriptions for Enterprises. It can create slides from scratch but can also be used in a “co-pilot” function where it gives recommendations based on the current slides. Plus AI claims to offer solutions QBR and sales, webinar and training, and strategy and report use-cases (Plus AI, 2023). I decided to add the extension to my Google profile and try to build a presentation from Scratch:

After adding the extension, I was able to access the GUI of Plus AI and check out its functions: You can make slides from scratch, write an entire text which the AI can analyze for important aspects, or specifially decide what content should be displayed on each slide. I went with “start from scratch” and prompted the AI to build a simple 8 slide presentation. It was supposed to showcase the benefits of using the Plus AI tool to students in an educational setting such as the one of BIM students. After a short loading time, I received eight slides that I could move around as I wanted and pick a theme but also create one.

After clicking on next, I was shown eight simple slides with yellow recommendation boxes and short sentences. To anyone who has already played around with generative AI, the format and writing style might be familiar, with hollow sentences and no big messages. In addition, the AI inserts stock imagery that aims to make the slides more visually appealing. What really is interesting is the recommendation system that is at play here. I found it useful at times and something I could hear my tutor give me as a feedback on one of my presentations.

All in all, I took away a couple of positive and mediocre experiences from this endeavour. “Door to door” meaning from installing the software to making slides took me around 10 minutes total, after which I received a structured pre-designed and pre-thought storyline that could be used in an educational setting. I put in the minimal effort of writing 3 sentences as a prompt and received a skelleton that could be filled with ideas, examples and theoretical frameworks. I find this to be helpful in starting off with a presentation. Clearly, this is not presentation ready and I would probably not deliver any valuable insights to my audience, but often starting off is the hardest part. Thus, I would recommend the use of this Plus AI as a brainstorming and ideation tool.

To connect this to our BIM class, I was vaguely reminded of a reading on AI usage in Business settings. In “From prediction to Transformation” by Ajay Agrawal, Joshua Gans, and Avi Goldfarb (2022), the statement was made that AI tools should not be used to fully replace tasks of humans, but rather augment them. While AI can deliver “predictions of possible outcomes” humans need to “Judge the feasibilities” (Agrawal et al., 2022). In this case, I was delivered with a possible outcome of my slide wishes, after which I needed to judge the feasibility of the output. I think the findings of Agrawal et al. are applicable in this situation and many other situations that managers will experience in the future, when AI becomes more integrated in software for daily use.
What do you think? Would you find this helpful in the future or did you try a different tool?
If you want to take a look at the slides, you can find them here.

References

Plus AI. (2023, September 30). Supercharge your slides with Plus AI. Retrieved from Plus AI: https://www.plusdocs.com/

Agrawal, A., Gans, J., & Goldfarb, A. (2022). From Prediction to Transformation. Harvard Business Review.

Please rate this

Is ChatGPT the modern medicine to remedy procrastination?

30

September

2023

5/5 (2)

The majority of students has faced procrastination at least once during their academic career. Researcher Adela Belin has given ten different psychological reasons why you might procrastinate in her article “Why Do I Write Essays Last Minute: Understanding the Psychology Behind It” (Belin, 2023). One of them regards the uncertainty in how to approach and start writing a university assignment exactly according to the assignment description (Belin, 2023). This can cause an overwhelming feeling especially when the topic is considered to be difficult.

The moments I have faced the feeling of procrastination was most of the time caused by this lack of certainty. I prefer to immediately start off well and not having to rewrite parts of the text since I see that as being inefficient in writing an assignment. However, the amount of knowledge regarding the subject increases with every hour you spend on writing it, which means that what you have already written down can be written more accurately with the new knowledge. So what if you could skip to this part of the essay writing and be able to spend more time on improving an essay instead of spending our time on writing a draft version?

During the Digital Business course in the third year of the Business Administration Bachelor’s degree, we were obliged to use ChatGPT, so I gave ChatGPT the complete assignment description as input. The output given by ChatGPT was very structured and immediately gave me the visualisation of how this assignment could be written to make it all-encompassing. Instead of having to start with a blank sheet, I could now skip to the part where the assignment is qualitatively improved! Through re-reading the complete assignment description and comparing it to the output given by ChatGPT, it became clear which parts of the given assignment needed improvement or what missed completely. By asking ChatGPT for more theoretical and practical evidence examples, I could choose from the given list which ones I would like to implement in the assignment. With a follow-up question, ChatGPT also incorporated the evidence parts I chose from the given list in the text.

ChatGPT had now provided me with a great draft version of the assignment and thus works as a very effective tool to provide inspiration and to visualise the assignment. It is however not just a tool, since the rapid rate wherein ChatGPT is advancing has caused the tool to be evolving into a platform with its Alpha test of incorporating plugins (Goldman, 2023). This development of ChatGPT causes third parties to be incorporated into the platform, which extends the capabilities offered by the platform (Watts, z.d.). Characteristic for a digital platform is creating value for its community (Watts, z.d.). The incorporation of plugins allows the ChatGPT user community to have an increased number of functionalities beyond ChatGPT itself, which increases the value offered (Watts, z.d.)(Goldman, 2023). In fact, these functionalities would have helped me a lot with the usage of ChatGPT to create my assignment.

Now that I considered the output to be a good assignment on its own, I proceeded to make ChatGPT function as an assistant and not as a replacement. ChatGPT as a tool is not capable of adding references to the written text (Welborn, 2023). As explained in the article of Welborn (2023), ChatGPT is made to recognize language patterns and is incapable of effectively analysing all the data it has access to. Asking ChatGPT for references is in my opinion useless since the majority of the output I received was completely made up. What I found to be a useful strategy is using a plagiarism check to find real sources that already overlap with the content in the text as well as get a great visualisation of which parts need explicit modification. I used the given sources by the plagiarism check to rewrite the applicable text parts and to incorporate the in-text citations. The remaining needed sources to substantiate the assignment I researched myself. In conclusion, the development of ChatGPT into a platform could create the opportunity for this plagiarism functionality as a plugin, which in my opinion would add a lot of value.

What are your thoughts on ChatGPT evolving into a platform and incorporating plugins such as a plagiarism check?

Reference list:

Belin, A. (2023). Why do I write essays last minute: Understanding the psychology behind it. writersperhour. https://writersperhour.com/blog/psychology-of-last-minute-essay-writing

Goldman, S. (2023, 23 maart). OpenAI turns ChatGPT into a platform overnight with addition of plugins. VentureBeat. https://venturebeat.com/ai/openai-turns-chatgpt-into-a-platform-overnight-with-addition-of-plugins/

Watts, S. (z.d.). Digital Platforms: A Brief Introduction. BMC Blogs. https://www.bmc.com/blogs/digital-platforms/

Welborn, A. (2023, 14 maart). ChatGPT and fake citations – Duke University Libraries blogs. Duke University Libraries Blogs. https://blogs.library.duke.edu/blog/2023/03/09/chatgpt-and-fake-citations/#:~:text=ChatGPT%20is%20based%20on%20a,or%20write%20your%20literature%20review.

Please rate this

AI is just fooling you

30

September

2023

5/5 (1)

The guilty pleasure of most students nowadays is using Al for studying purposes. Not only to write essays and papers but also to answer questions or even get help with coding problems. With one input question, the system writes a complete answer. I have used chatGPT in the past for a Python course to help me find creative ways to solve my coding problems. The use of AI can be very beneficial for students. AI assists in structuring essays by recommending the appropriate introduction, body, and conclusion. Also, AI can suggest relevant ideas and themes, facilitating research and idea generation for essays and research papers. Additionally, AI provides real-time feedback on grammar, spelling, and style, helping students improve their writing skills and produce error-free compositions (AIwhisperer, 2022).

However, this might be too good to be true. In the latest research on how to cheat on your final paper, it was claimed that students did not evaluate the use of AI for writing a paper as easier (Fyfe, 2022). The students discovered that this was not easy at all. Around 87% reported that it became far more complicated than just writing the paper themselves. They had to keep giving chatGPT “a shove in the right direction”. Besides that, AI presented false statements and even quoted non-existent experts in the essay (Fyfe, 2022).

Another negative aspect of using AI is that students might become overly dependent on AI tools, potentially hindering the development of their critical thinking and writing skills (AIwhisperer, 2022). The whole process of learning involves learning to think for yourself and making mistakes to learn from. Even though it is super useful and easy in many ways, isn’t it limiting our own abilities in the long run? An important question you could ask is whether using AI during studying is actually valuable or are you just fooling yourself? Looking at the older generations, their learning process took way more effort. Going to the library, searching for the right information, and reading books. They were very much forced to learn and grow. I would like to know your opinion on this subject. 

Let me know in the comments. 

Fyfe, P. (2022). How to cheat on your final paper: Assigning AI for student writing. AI & SOCIETY, 1-11.

AIwhisperer, T. (2022). How students are using artificial intelligence to write essays. medium. https://medium.com/@JimTheAIWhisperer/how-students-are-using-ai-to-write-essays-2d5ee187385c

Please rate this

The AI wardrobe: Coperni’s six-months of fashion styles

30

September

2023

5/5 (1)

Dive into the wild ride of fashion weeks, where showing clothing alternatives and placing orders has disappeared and where AI has now firmly established its presence in the world of fashion.

Throughout history the aim of fashion weeks has been to promote local talent (Zhang et al., 2022) and to showcase designers’ interpretations of seasonal trends and styles (Pulikottil-Jacob, 2022). However, thanks to the digital revolution, fashion weeks have evolved beyond their traditional role. They have become a week of international sensation, enabling brands to connect with their audience through new technologies (Bringé, z.d.).

Coperni, a high-fashion brand, has seamlessly integrated AI into its runway shows across various fashion weeks. During Paris Fashion week, anyone interested could witness Bella Hadid spray-painted in a dress through online virtuality. Recently, Coperni unveiled a polymorphic story, inspired from the fable ‘The Wolf and the Lamb’, to illustrate the relationship between humans and technology.

This polymorphic video will run live for six months, unveiling a fresh story every 40 seconds. Each story is based on multiple choices of sets, scenarios, or looks from their collection. After the six months, there will be 320.000 different stories generated (Coperni, 2023). The music and voiceovers in the video are created from three main tracks that have been decomposed into numerous versions (Coperni,2023). But how is Coperni bringing this futuristic story to life? They are leveraging AI to create these ongoing variations, and the voice overs are produced by Chat Gpt. This tool is tasked to generate different text versions of the original fable ‘The Wolf and the Lamb’, which are then read by an AI-generated voice.

Technology is becoming a part of many different industries, even changing how fashion weeks happen compared to years ago. Now, as fashion and technology walk hand in hand, Coperni’s AI video is just the start. What new combination of fashion and tech will we see next?

References

Bringé, A. (z.d.). The Evolution of Fashion Week: where creativity meets digital transformation. www.linkedin.com. https://www.linkedin.com/pulse/evolution-fashion-week-where-creativity-meets-digital-alison/

Coperni. (2023, 20 september). The Wolf and the Lamb – Fall/Winter 23 campaign [Video]. YouTube. https://www.youtube.com/watch?v=rxRNp1Ro2pw

Pulikottil-Jacob, R. (2022). Fashion weeks and customer experiences in emerging markets. In Springer eBooks (pp. 47–75). https://doi.org/10.1007/978-3-031-07326-7_3


Zhang, X., Zhang, Y., Chen, T., & Qi, W. (2022). Decentralizing the Power of Fashion? Exploring the geographies and inter-place connections of fashion cities through Fashion Weeks. Urban Geography, 1–20. https://doi.org/10.1080/02723638.2022.2147742

Please rate this

The Deepfake Dilemma: When Technology Threatens Trust

29

September

2023

5/5 (1)

Imagine you’re receiving a video call from a family member and everything looks and sounds normal. But then, he asks you for money because otherwise, he will get in trouble. Something similar happened to a relative. His brother was video calling him on WeChat and asking for money. Everything looked and sounded normal; after all, it was his brother that he saw on the screen. However, he was reluctant because why would he suddenly ask for money? He didn’t transfer the money and ended the call. Afterwards, he called his brother on the phone and told him that he didn’t video call him at all. It turned out that the scammer used AI deepfake to impersonate the relative’s brother using the same voice and image. Luckily, my relative didn’t fall into the scammer’s trap as he was aware of this scam, however, some were unfortunate to get tricked. For example, a man in China transferred around $570,000 to a scammer using deepfake, thinking that he was helping a friend in a bidding project (Zhao, 2023).

AI deepfake has been on the rise as AI technology becomes more accessible. Because AI is developing rapidly, it is becoming increasingly challenging to spot a deep fake video and scammers use it to their advantage. But how does it actually work? Deepfakes take examples of audio or footage of someone and learning how to recreate their movements and voice accurately. All it takes is a few photos of the target’s face which can be taken from social media or a short video clip of less than 15 seconds to recreate a person’s voice (Chua, 2023). This raises ethical and privacy concerns as it can violate individuals’ privacy by creating fake videos and images that they did not have given permission for. Besides, what will the scammer do with your videos and images? What if they use them for other malicious purposes?

What’s interesting is that the owner of WeChat actually thinks that deepfakes could be good, emphasizing it as highly creative and groundbreaking technology. The owner gave a few examples of how deepfake can be applied in the present and in the future. For instance, deepfake can be used to let deceased actors appear in new movies or to generate voice-overs in different languages. Furthermore, deepfake can help patients affected by chronic illness. For example, deepfake allow people who has lost their voice to communicate through this technology (Hao, 2020). AI deepfake has the potential for positive applications, however, misuse of this technology for malicious intents is a significant concern. Although the owner of WeChat sees deepfake as technology that can be good, the question is how will they protect users from harm?

As technology continues to advance at a rapid pace, there is also a dilemma: it challenges our ability to separate truth from fiction while raising ethical and privacy concerns. In a world where a familiar face on a video call can no longer be taken at face value, you need to think twice and ask yourself if what you are seeing is real or fiction.

What are your opinions on AI deepfake? 

Sources:

Chua, N. (2023). Scammers use deepfakes to create voice recordings and videos to trick victims’ family, friends. https://www.straitstimes.com/singapore/scammers-use-deepfakes-to-create-voice-recordings-and-videos-of-victims-family-friends-to-trick-them

Entrepeneur. (2023). ‘We were sucked in’: How to protect yourself from deepfake phone scams. https://www.entrepreneur.com/science-technology/5-ways-to-spot-and-avoid-deepfake-phone-scams/453561

Hao, K. (2020). The owner of WeChat thinks deepfakes could actually be good. https://www.technologyreview.com/2020/07/28/1005692/china-tencent-wechat-ai-plan-says-deepfakes-good/#:~:text=The%20news%3A%20In%20a%20new,a%20highly%20creative%20and%20groundbreaking

Zhao, H. (2023). AI deepfakes are on the rise in China. https://radii.co/article/deepfake-china-ai-scammers

Please rate this

Tired of watching lectures? why not write a Python script for automated summarisation using GTP-4

29

September

2023

5/5 (1)

So, here I am, sitting in my student room, watching the Information Strategy lectures on Canvas. I am doing my very best to soak up every single bit of knowledge Professor Li bestows upon me, but I am unable to focus. The inevitable seems to have finally happened: short-form content on TikTok and YouTube has reduced my attention span to that of a goldfish.

With the last two brain cells I could muster, I thought to myself: “I bet you could write a script in Python to summarize videos in some way or form… If only I knew how to do that…”.

As I don’t know much about coding I decided to do research, give up, and ask Chat-GPT instead. And? Surprise surprise, together with chat-GPT I succeeded in writing a program that takes an MP4 as input and writes a summary as output. Here is how I did it.

First I needed to write a function that takes an MP4 and extracts the audio. This was really easy (for Chat-GPT). Within 10 seconds I had a working code snippet. The next steps required me to actually think for myself. I know! unimaginable!

Next, I asked Chat-GPT how to make an API request for the OpenAI whisper model. But, with the information cut-off of 2021, this large language model doesn’t even know how to access its own API. The API documentation led me to copy the example code and change the variables to fit in my code. Chat-GPT helped me troubleshoot the code when it was not working and helped me define the API call into a Python function I could use later in the script.

The next task is summarisation. This is done with OpenAI API as well. Here I copied the example code and changed the parameters and variables. The code needs to be adjusted to use the text transcribed by the OpenAI whisper model. According to the Chat-GPT, we can do this by implementing the transcribed text into the messages parameter with a formatted string. Here, ‘content’ is the string contained within the output text file.

Now that these three functions are defined we can use them together at the end of the script. Here the functions you define actually get executed. This all results in the following python code:

import subprocess
import openai
import os

openai.api_key = 'Your_api_key'

def extract_audio(input_file, output_file):
    try:
        subprocess.run(['ffmpeg', '-i', input_file, '-vn', '-acodec', 'libmp3lame', '-ar', '16000', '-ac',
        '1', output_file], check=True)
    except subprocess.CalledProcessError as e:
        print("Error converting video to audio:", e)
        exit(1)


def transcribe_audio(audio_file):
    with open(audio_file, 'rb') as audio:
        return openai.Audio.transcribe(
            model = 'whisper-1',
            file = audio
        )

def summarize_text_from_file(filename):
    # Read the content of the file
    with open(filename, 'r') as file:
        content = file.read()

    # Use GPT-3 to summarize the content
    response = openai.ChatCompletion.create(
        model= "gpt-3.5-turbo",
        messages = [
            {"role": "system", "content": "You are a summarisation expert."},
            {"role": "user", "content": f"Summarize the following text focussing on the acedemic principle that might be relevant for Business Information Management students:\n\n{content}\n"},
        ],
        max_tokens= 1000  # Adjust as needed
    )

    # Extract the summary from the response
    summary = response['choices'][0]['message']['content']
    return summary


if __name__ == "__main__":
   
    input_file = input("Enter the filepath to your video: ") # Ask you were to find the MP4. 
    audio_file = 'output.MP3'  # Temporary audio file
    output_text_file = input_file + '.txt'  # Output text file

    extract_audio(input_file, audio_file)
    transcribed_text = transcribe_audio(audio_file)

    with open(output_text_file, 'w', encoding='utf-8') as text_file:
        text_file.write(transcribed_text['text'])

    # Clean up temporary audio file
    os.remove(audio_file)

    print("Transcription complete. Text saved to", output_text_file)

    filename = output_text_file
    summary = summarize_text_from_file(filename)
    print("\nSummary:")
    print(summary)

When you run the code, it asks you for the file path to your MP4 video. In my Code editor (VSCode) you can copy and paste the relative path of documents in the directory making this pretty easy. Otherwise, move the MP4 file to the same folder as the script and just copy-paste the name of the file then press enter and watch some TikToks, Shorts, and Reals while the script does all the work.

So, the big reveal! Here is a summary of the WeChat Video from session 2.5 generated by this script:

“Summary:
The text focuses on the academic principle of the emergence and influence of Chinese apps, particularly WeChat, on the development of the Internet and its potential implications for Western tech companies and individuals. It highlights the unique characteristics of the Chinese Internet, which operates as more of an intranet due to the Great Firewall that blocks foreign sites. As a result, Chinese copycat apps have filled this void and evolved to become successful companies. WeChat, in particular, is described as a super app that offers a wide range of services within one platform. The convenience and transformative nature of this technology are emphasized. However, concerns are raised regarding data privacy and governmental control due to the concentration of personal data in the hands of these companies and the Chinese government’s history of human rights violations. It is suggested that Western tech companies are now attempting to replicate the success of super apps like WeChat, which could have both powerful and problematic implications.”

So yeah writing this code took way longer than just watching the lectures but that’s not the point! There is a wise lesson to be learned here: Never hold yourself accountable for any problems that arise from your excessive use of social media and great things will come from it! Therefore, just keep scrolling.

P.S.: If you have any other ideas for automation that increase the amount of time I can spend on my phone, post them in the comments. Would love to try your ideas next.

Please rate this

Personal Trainer with AI Coach in Presentation Skills

29

September

2023

5/5 (1)

Over the past few years, technological advancements in speech processing have drastically transformed how humans engage with digital devices (Yu, 2016). These developments have paved the way for rapid progress in voice recognition technology, which, in turn, has opened doors for integrating AI into speech training. Particularly, AI-driven speech training has proven effective in enhancing students’ presentation abilities (Junaidi, 2020). Additionally, several studies have shown that due to the fear of speaking in public, individuals often experience such significant anxiety when delivering oral presentations that it can potentially impact their mental health and overall well-being (Grieve, 2021).

During the previous years of the author’s internship at an Amsterdam start-up soft skills training organisation, Lepaya, an opportunity was offered to try out the so-called “AI Coach.” AI Coach is a virtual platform on the mobile app for learners to acquire communication skills effectively with “Machine Based eLearning (MABEL)” (Hoelzer, 2022).

This AI-driven method of Learning and Development (L&D) employs Machine Learning Algorithms on various data types like videos, audio, and text to aid users in improving their conversational abilities in practical scenarios (AI Skills of the Future: Understand AI and Make It Work for You, n.d.). Their process involves collecting practice videos, analysing them using AL systems to extract key speech and conversation indicators such as gestures, facial expressions, and voice, and then providing feedback to users for improvement (Hoelzer, 2022).

The pipeline comprises several steps, beginning with videos being collected internally or through the app developed in Flutter (Hoelzer, 2022). Next, videos are processed using MABEL API, which analyses video, sounds, and text using Python and Docker within Sagemaker on AWS and machine learning libraries like TensorFlow, PyTorch, and scikit-learn (Hoelzer, 2022).

Afterwards, data is collected to transform into datasets, with Luigi used to track transformations and ensure reproducibility. Then, annotated datasets are crucial for training machine learning models using LabelStudio, covering aspects like filler words, gestures, facial expressions, and overall presentation ratings. Next, machine learning models are developed based on the annotated datasets, including audio models (e.g., filler word detection), video models (e.g., human keypoint detection. Emotion classification), and regular models (to provide a presentation rating). Tools like Melflow are used to manage experiments. Lastly, after quality assurance checks, the updated MABEL pipeline with the new models is delayed (Hoelzer, 2022).

In conclusion, this comprehensive approach, which combines generative AI and effective training methods, represents a major leap forward in communication skills development.

References:

AI Skills of the Future: Understand AI and Make it Work for You. (n.d.). https://www.lepaya.com/blog/ai-skills-of-the-future

Grieve, R., Woodley, J., Hunt, S. E., & McKay, A. (2021). Student fears of oral presentations and public speaking in higher education: a qualitative survey. Journal of Further and Higher Education45(9), 1281-1293.

Hoelzer, T. (2022, November 7). MABEL — How we build AI at Lepaya Tech – Lepaya Tech – Medium. Medium. https://medium.com/lepaya-tech/mabel-how-we-build-ai-at-lepaya-tech-2ed6c806a23c

Junaidi, J. (2020). Artificial intelligence in EFL context: rising students’ speaking performance with Lyra virtual assistance. International Journal of Advanced Science and Technology Rehabilitation29(5), 6735-6741.

Please rate this