So, here I am, sitting in my student room, watching the Information Strategy lectures on Canvas. I am doing my very best to soak up every single bit of knowledge Professor Li bestows upon me, but I am unable to focus. The inevitable seems to have finally happened: short-form content on TikTok and YouTube has reduced my attention span to that of a goldfish.
With the last two brain cells I could muster, I thought to myself: “I bet you could write a script in Python to summarize videos in some way or form… If only I knew how to do that…”.
As I don’t know much about coding I decided to do research, give up, and ask Chat-GPT instead. And? Surprise surprise, together with chat-GPT I succeeded in writing a program that takes an MP4 as input and writes a summary as output. Here is how I did it.
First I needed to write a function that takes an MP4 and extracts the audio. This was really easy (for Chat-GPT). Within 10 seconds I had a working code snippet. The next steps required me to actually think for myself. I know! unimaginable!
Next, I asked Chat-GPT how to make an API request for the OpenAI whisper model. But, with the information cut-off of 2021, this large language model doesn’t even know how to access its own API. The API documentation led me to copy the example code and change the variables to fit in my code. Chat-GPT helped me troubleshoot the code when it was not working and helped me define the API call into a Python function I could use later in the script.
The next task is summarisation. This is done with OpenAI API as well. Here I copied the example code and changed the parameters and variables. The code needs to be adjusted to use the text transcribed by the OpenAI whisper model. According to the Chat-GPT, we can do this by implementing the transcribed text into the messages parameter with a formatted string. Here, ‘content’ is the string contained within the output text file.
Now that these three functions are defined we can use them together at the end of the script. Here the functions you define actually get executed. This all results in the following python code:
import subprocess
import openai
import os
openai.api_key = 'Your_api_key'
def extract_audio(input_file, output_file):
try:
subprocess.run(['ffmpeg', '-i', input_file, '-vn', '-acodec', 'libmp3lame', '-ar', '16000', '-ac',
'1', output_file], check=True)
except subprocess.CalledProcessError as e:
print("Error converting video to audio:", e)
exit(1)
def transcribe_audio(audio_file):
with open(audio_file, 'rb') as audio:
return openai.Audio.transcribe(
model = 'whisper-1',
file = audio
)
def summarize_text_from_file(filename):
# Read the content of the file
with open(filename, 'r') as file:
content = file.read()
# Use GPT-3 to summarize the content
response = openai.ChatCompletion.create(
model= "gpt-3.5-turbo",
messages = [
{"role": "system", "content": "You are a summarisation expert."},
{"role": "user", "content": f"Summarize the following text focussing on the acedemic principle that might be relevant for Business Information Management students:\n\n{content}\n"},
],
max_tokens= 1000 # Adjust as needed
)
# Extract the summary from the response
summary = response['choices'][0]['message']['content']
return summary
if __name__ == "__main__":
input_file = input("Enter the filepath to your video: ") # Ask you were to find the MP4.
audio_file = 'output.MP3' # Temporary audio file
output_text_file = input_file + '.txt' # Output text file
extract_audio(input_file, audio_file)
transcribed_text = transcribe_audio(audio_file)
with open(output_text_file, 'w', encoding='utf-8') as text_file:
text_file.write(transcribed_text['text'])
# Clean up temporary audio file
os.remove(audio_file)
print("Transcription complete. Text saved to", output_text_file)
filename = output_text_file
summary = summarize_text_from_file(filename)
print("\nSummary:")
print(summary)
When you run the code, it asks you for the file path to your MP4 video. In my Code editor (VSCode) you can copy and paste the relative path of documents in the directory making this pretty easy. Otherwise, move the MP4 file to the same folder as the script and just copy-paste the name of the file then press enter and watch some TikToks, Shorts, and Reals while the script does all the work.
So, the big reveal! Here is a summary of the WeChat Video from session 2.5 generated by this script:
“Summary:
The text focuses on the academic principle of the emergence and influence of Chinese apps, particularly WeChat, on the development of the Internet and its potential implications for Western tech companies and individuals. It highlights the unique characteristics of the Chinese Internet, which operates as more of an intranet due to the Great Firewall that blocks foreign sites. As a result, Chinese copycat apps have filled this void and evolved to become successful companies. WeChat, in particular, is described as a super app that offers a wide range of services within one platform. The convenience and transformative nature of this technology are emphasized. However, concerns are raised regarding data privacy and governmental control due to the concentration of personal data in the hands of these companies and the Chinese government’s history of human rights violations. It is suggested that Western tech companies are now attempting to replicate the success of super apps like WeChat, which could have both powerful and problematic implications.”
So yeah writing this code took way longer than just watching the lectures but that’s not the point! There is a wise lesson to be learned here: Never hold yourself accountable for any problems that arise from your excessive use of social media and great things will come from it! Therefore, just keep scrolling.
P.S.: If you have any other ideas for automation that increase the amount of time I can spend on my phone, post them in the comments. Would love to try your ideas next.
Hi Mees, what a really novel idea to write a Python script for automating summarizations of lectures!
As you’ve stated, with excessive use of short-form content, our attention span has decreased tremendously compared to our previous generations. I experience the same issue when watching videos home for my study. Maybe I’ll use this as a test for another lecture in the future! I think your blog makes it very clear how endless the practical applications of ChatGPT are. You’ve done this by using both ChatGPT for the initial Python script as for summarizing the transcribed text.
One question I do have, will these type of applications also change the future of lectures? Although most lectures are not recorded anymore, maybe there are some other variants or applications to use this. Furthermore, I’m curious how well the summarization of the WeChat Video from session 2.5 is in your opinion? And what if you would like to have a more extensive summarization? Do you have to write another Python code or…?
Hi Thijs,
Awesome you liked my idea! To answer your questions:
1. It is quite easy to adapt the application to use a sound recording that could be made during a live lecture. That would mean the application skips the step of converting the video to sound and instead uses the recording as input.
2. I was amazed by the quality of the transcription, it was almost without any mistakes making it a good basis for a summary. The summary was good but there is room for improvement. Maybe in the future, I would create a way to input additional lecture notes or slides to which a higher weight is given in the summary, providing the model with additional data. Also, I would create an option to tell the model how long the summary needs to be but currently that is not possible, meaning I’d have to alter the prompt in the model.
Thank you for your interest!