The Anatomy of an AI generated TikTok post

9

October

2024

No ratings yet.

Many of us like to indulge in scrolling our phones in our free time. It’s a quick guaranteed way to kill some time while waiting around, or gathering the strength to finally get out of bed. While our options were limited to classic picture-and-text posts until a few years ago, the meteoric rise of short-form video content has been dominating the doomscrolling niche in recent times. After personally seeing a specific type of content (rehashed reddit drama stories), I began to wonder, could you automate this with GenAI?

Down the rabbit hole

I first began by analyzing the catalyst for my idea, hoping to find the exact mental tasks necessary to pass to GenAI to automate it, however, I got sidetracked. After listening to the specific video closely, I have noticed that the voice and sound design were extremely high quality, it was as if someone who specialized in narrating commercials took some time out of their working hours, and used their studio setup in order to record the voiceover. The display of text and editing also seemed off for a typical TikTok post, it had a few errors that in my mind were “no-brainers”, editing choices that made the story less understandable. That’s when I realized that I didn’t spend my time looking at someone’s geniune effort, I was staring right at my own idea, implemented before I did it.

Shifting gears

At that point, I was both disappointed and amazed, my original idea would probably not find the success that I hoped it could, after all the market for AI generated short stories in video form was presumably already saturated, however, I wanted to know more about this topic. I never personally found this type of content to be of amazing artistic quality, but it could be the springboard for future GenAI development in entertainment. I decided to still go ahead with the project until a proof-of-concept stage, to capitalize on the learning opportunity.

When finally I had the code up and running, churning out videos with little-to-no human input, taking a fraction of the time (and artistic integrity) that a human would, I felt like I stumbled onto the crux of emotional manipulation of contemporary social media. I felt like I found something that every one of my friends should at least know about. Even more disturbingly, during the research for this project, I have not found any complete documentation of what this content is and how it’s made. I only had my account of my experiences and thoughts to go on.

This blogpost outlines my findings and opinions on how this content is made, with the hope that at least getting this knowledge out there will restore some fairness in knowing what you consume to social media.

However, there are still a topic of discussion I could not answer even at the end of the project. Is this truly a new form of entertainment? Am I quick to condemn something new and exciting? Therefore throughout the blogpost I will try to give as much practical information as possible (without encouraging or condemning the non-illegal parts) and hope that this topic will sort itself out in due time if more people have the ability to do it.

How the sausage is made

After conducting some research (if you can call scrolling TikTok for a few hours research), I have found some elements that are common across many of these videos:

Profile picture

Automating away the process of selecting a profile picture seems to be an exercise in futility, after all, why spend any time on automating something that will only occur once, right? Well, online speculation is focused on the conjecture that most platforms purposefully de-amplify AI generated content, and the channel itself will hit a low ceiling on how high it can get in the algorithm. Therefore, I speculate that most organizations that run channels like the one I was trying to create, actually run multiple channels concurrently, thus cheating the “algorithmic ceiling” imposed on them.

Voice

The main way that the underlying story is conveyed to the viewer, is the AI generated voiceover. It is extremely common, with only a small percentage of these videos opting for a musical background or simply no voice. These voices can be created by training a neural network on a few thousand hours of publicly available speeches of one specific person and the transcript. The main objective is to find a voice that is both soothing and fitting for the type of story. Voice can only be done “wrong” after all, by selecting a voice that is irritating or hard to understand, excluding potential viewers from consuming the content.

Most posters use a third-party service (most commonly Elevenlabs) to acquire this aspect of the post. Most AI voiceover services (including Elevenlabs) charge between 30-100 euros per month of API access (compared to a rumored figure of 20-50$ of income per million views), but offer a free tier, given that the user first registers with their Google account. This creates a large incentive for unscrupulus (but strongly business minded) posters to buy stolen Google accounts (which entails supporting the theft of gmail accounts) to generate the voiceover. Free alternatives exist, but require a quite strong PC to actually run and generally slightly underperform when compared to paid services.

Story

Finding an interesting story to put into video form is hard work. However, as most of these stories are selected for their strong language, uncontroversial interpretability and general relatability, there is a really good proxy to look for when selecting them. As Reddit has a rating system, these stories are easy to find, go on a subreddit (a sub forum where related topics are discussed), sort by highest rated, set the filter to “highest rated of all time”, and just like that, you have a scoop…. or at least I initially thought so.

The problem with this approach is that these stories are “internet famous”, there is a high likelyhood that your viewer will have heard the story before; They’ll listen for the first few seconds, conclude that they don’t need to listen to it again, swipe down, and tank your video’s rating to the bottom of the algorithm.

Thankfully, if you are willing to abandon artistic integrity (or don’t view it as such), you can fine tune a generative AI model to write the stories for you. By collecting the top stories from a given story niche (or better yet, collecting it from many niches and using an unsupervised machine learning algorithm to classify them into niches), you can make sure that your generative AI model will be able to create tall tales to entertain the world.

Scenery

A defining feature of these types of short-form videos is the background video. Most posters strive to find something that occupies the part of the viewers brain that isn’t actively engaged in listening to the story. For this, bright colors and easy to interpret picture content is a must. The end goal is to totally engage the viewer at all levels, making them focused on the content. Familiar video content (the games Subway Surfers, Minecraft and GTA 5 work exceptionally well) makes interpretation more fun for the viewer, evoking feelings of either nostalgia or active interest.

Generative AI does not play a role in this part of the content. While it is theoretically possible to train an AI model to play games for background video, it is currently easier to find a large chunk of video content that can be cut up into many small pieces.

Again, the problem of the most obvious solution being suboptimal rises. There is a finite amount of explicitly royalty free content that fit well with this medium. If viewers recognize the specific background video from a competing channel, it will engage them less. Therefore, some posters opt to either pay royalties (again, hopelessly eating away at the profit margin), or just stealing content that wasn’t royalty free.

As most of the footage originates from YouTube and the story content is posted on TikTok, this leverages a gap in content moderation policy. The two sites rarely coordinate on copyright issues, especially if the piece of content is of low value (for example someone’s ages old minecraft gameplay), ensuring that this method entails low risk.

Editing

The last component of a post of this nature is the editing. There are 3 main aspects that have to be covered, cutting the background footage to last until the story does, displaying subtitles and animating said subtitles. There are currently many Python libraries that offer rudimentary video editing that fit this purpose, such as MoviePy.

A simple approach would entail defining static rules around these tasks. The subtitles in this type of content usually utilize a “pop” effect, where the text enlarges and slightly shrinks in quick succession. This captures the viewer’s attention, as humans are hard-wired to pay attention to fast moving objects, naturally drawing the viewers attention to the subtitles.

This leaves only two tasks, cutting the video to fit the story (keeping in mind to only show attention grabbing sequences) and displaying the subtitles (keeping in mind to display related words together). For this task, GenAI outperforms static rules, most general knowledge large language models that support video RAG (retrieval augmented generation) can be prompted to accomplish these tasks. Better yet, they’re able to write code themselves that they can interact with, setting the appropriate parameters for each video.

Running these models locally might not make business sense, as these models require a strong PC, which might prohibitively eat away at the profit margin. Third-party solutions do exist and I trust that by now you have noticed a pattern. The token pricing is too high to be sustainable for the enterprise. All that is necessary to acquire this capability below market price is to get a hold of a stolen API key for any state-of-the-art model, offloading the computational task to a server far away.

FIN

With this, you now know all the details that I uncovered behind this type of content. I found that this content is very troublesome to make. My overarching theory behind the creation pipeline is that it is simply too expensive to create compared to the little income it brings in for any organization. Tiktok reportedly pays around 20-50$ per million views, this is simply not enough to support an ethical creation of this type of content right now. However, I sincerely believe that this will change, at which point the internet will have to collectively decide the fate of short-form storytime content. We all play a part in this conversation, so I encourage to leave your opinion down below in the comments section.

The Metaverse: When ideas outpace hardware

5

October

2024

No ratings yet.

Recently, the news that Meta, the company behind the Metaverse and many VR devices is about to launch a new version of their flagship VR headset first leaked through FCC filings and then later got announced at Meta Connect 2024. As this announcement comes before the first birthday of their previous flagship device, the Quest 3, this left many puzzled on how this fits into the firm’s strategy.

Does the past predict the future?

“Study the past if you want to divine the future” – Confucious

When Mark Zuckerberg famously invented the predecessor to Facebook in 2003 out of his dorm room at Harvard, he came up with an idea that would really only find the level of success that it did after years of technological advancements and progress. The initial way users could interact with the website was through large and practically immovable desktops and thick, heavy laptops (the era-appropriate ThinkPad was 2.22kg, equipped with exactly 1 CPU core).

The way that users could even take a photo of their face to make a post, would involve first buying a digital camera, that took blurry, low-resolution photos and navigating the process to upload the photo first to capable machine. After all that, the user would have to find an opportune moment when nobody is taking a call on their landline phone, so that they could use their dial-up modem to connect to the internet and finally post the picture at a blazing 56 kbps (the chance of Windows XP not displaying the infamous blue screen of death notwithstanding).

So why did Facebook become such a massive success? In part, because in the late 2000s smartphones and surrounding technologies such as DSL internet connections and WiFi became prolific. Posting would no longer involve jumping through numerous hoops and silently hoping that nothing breaks that can’t be fixed by the user. It was a simple matter of opening the camera roll, being connected to the home WiFi network and pressing “post”.

Ahead of schedule

“In firing, at an object in motion, the instructor should explain that the best way is to aim in the usual
way, and then, without dwelling an instant on the aim, move the rifle laterally in the direction and to
the extent required […]” – Manual for Rifle Practice by General George Wingate, 1874

Facebook found success not by just being one of the most capable social media platforms on the early internet. A core factor in Facebook’s success was that it rode a wave of technology that came after its inception. If you wanted to develop a competing website in 2010, when the enabling technologies were well-established, you were going up against a giant made up of 1700 employees with 500 million active users.

This is a common theme with many internet companies, Google began as a research project in 1996, when only 18% of U.S. households had access to the internet to even have the problem of not knowing what website to go to (U.S. Census Bureau, 2005). This figure would jump to 26% in the next 2 years, and by 2001 over half the households surveyed had access to the internet within the comfort of their own homes.

What did these companies do? They observed fast-moving frontier developments in technology, and decided to base their firms around a service that enables that technology to do new and valuable things for the customer. By the time any competitors could arise, they were well-established and in customers’ minds, which enabled them to dominate the market for the coming decades. They anticipated where a technology would be in a few years and built their products for that level of advancement, not what was currently the norm.

Betting the house on it

“The definition of insanity is doing the same thing over and over and expecting different results.” – Albert Einstein

When in 2014 Facebook acquired Oculus, the company behind the trailblazing VR headsets “Oculus Rift” and “Oculus Rift S”, Mark Zuckerberg must have had a sense of déjà vu; He saw a fresh technology that is currently clunky, burdensome to use and developing fast. Anticipating the same momentum he saw with smartphones, he had an ambitious vision; What if he could replicate the success of Facebook, not by connecting people through screens and keyboards, but through the natural medium of speech, movement and body language?

After acquiring the firm, VR technology went through important transformations from a usability perspective. With the release of Oculus Go in 2018, if you wanted to jump into VR, you no longer needed to drill holes in your wall to set up base stations to track your controllers, there was no need to buy a gaming PC that would process the frames sent to the display, and you wouldn’t entangle yourself if the display cable as you whipped around observing your digital surroundings.

The company went through a quick transformation, now rebranded to “Meta”, 1 Hacker Way became the physical home to the prospective Metaverse, a VR accessible way of connecting with friends, colleagues and strangers on the internet.

Foreclosure

“3.6 roentgen, not great, not terrible.” – Chernobyl (HBO)

However, Mark Zuckerberg’s vision was not to follow the timeline he might have imagined. The transformation of Facebook to Meta was a financially brutal affair. The Reality Labs division (mostly made up of former Oculus employees) posted a whopping $13.7 billion loss after a year of the company’s rebranding (Meta, 2023).

In order to “pursue greater efficiency and to realign [Meta’s] business and strategic priorities”, the company underwent a major restructuring effort that resulted in ballooning R&D budgets and a layoff of around 20,000 employees (Kerr, 2023).

In the face of these increasing costs, there was little promise of income from this change. The news cycle quickly filled with stories around how empty the current Metaverse is. In 2022 it was reported that only 9% of worlds created by users were visited by at least 50 people (TND Newsdesk, 2022). Additionally, news kept cropping up around the percieved absurdity of investing into projects in the metaverse, such as the infamous EU sponsored party that cost €387,000 and drew an attendance of 5 people (Fiedler, 2022).

Present day

“If At First You Don’t Succeed, Try, Try Again” – Zen Cho

However, Meta adamantly refuses to give up pursuing its vision of the Metaverse. The company actively engages in a strategy of trying to advance the hardware customers can use to access the digital space. Even though the VR headset market advances very quickly, and therefore traditionally cornering it through a high marketshare is less feasible, Meta currently services 75% of the market (Armstrong, 2023). This suggests that the firm is pouring more money into the research and development of this technology than it would make sense if it only engaged in the market for short-term monetary gain.

The news of the Quest 3S, announced on September 25th, seems to be the latest bid from the firm to get more users online. From a hardware standpoint, the Quest 3S makes no business sense. It is overall on par with the recently released Quest 3, for three quarters of the price of the previous device, with what seems to be a full-feature (~€30) game thrown in with every purchase.

Ignoring the context, this would be a textbook case of competing with your own product, however, I view it as a perfect step to see through the vision of the Metaverse by lowering the barrier to entry for prospective users.

References:

Armstrong, M. (2023, February 28). Meta leads the way in VR headsets. Statista Daily Data. https://www.statista.com/chart/29398/vr-headset-kpis/

Fiedler, T. (2022, November 30). EU throws party in €387K metaverse — and hardly anyone turns up. POLITICO. https://www.politico.eu/article/eu-threw-e387k-meta-gala-nobody-came-big-tech/

Kerr, C. (2023, October 8). Meta plans for another 10,000 layoffs just months after cutting 11,000 jobs. https://www.gamedeveloper.com/business/meta-plans-for-another-10-000-layoffs-just-months-after-cutting-11-000-jobs

Meta. (2023, February 1). Meta Reports Fourth Quarter and Full Year 2022 Results. https://investor.fb.com/investor-news/press-release-details/2023/Meta-Reports-Fourth-Quarter-and-Full-Year-2022-Results/default.aspx

TND Newsdesk. (2022, October 17). https://www.technewsday.com/2022/10/17/metaverse-faces-low-usage-as-users-complaints-mount/

U.S. Census Bureau. (2005). P23-208 Computer and Internet Use. In U.S. Census Bureau Library (No. P23-208). https://www.census.gov/content/dam/Census/library/publications/2005/demo/p23-208.pdf

9