Videogen

System description

Videogen generates short form social media videos for services like TikTok or YouTube Shorts.

One of the more common types of videos found on short form video apps like TikTok is the AI story. TikTok already made extensive use of AI technology such as text-to-speech (TTS) and auto captioning, and this has created a culture of acceptance around AI generated content. However, one issue is that many creators will simply steal content from other sites, for example Reddit, and upload an AI voiced version with the barest of attribution. One common form of these videos is a story told by an AI voice set over gameplay from games such as Minecraft and Subway Surfers.

Videogen is an attempt to recreate these videos but in a more ethical way. It uses writing prompts from the /r/WritingPrompts subreddit to create engaging narratives using generative AI. The subreddit considers these prompts to be public domain but the individual written stories to be the intellectual property of the writers. Videogen is able to side-step any IP issues by only using the writing prompts. The /r/WritingPrompts community does not allow AI generated content to be posted to the subreddit, so Videogen does not upload its creations to Reddit.

A video created by Videogen based on a Reddit writing prompt by /u/Known231: You are a monster that had spent countless years hidding (sic) amongst humans. One day day, your human companion comes to you scared and that very day is the day you show the world how monsterious (sic) you truly are.

The system uses Llama 3.1 running in Ollama to create a roughly 600 word short story. The LLM is asked to write the story in the first person perspective, and is told the gender of the speaker to ensure a modicum of consistency between the narrator and the narrative.

The resulting story is then divided into pieces of ~150-200 words, roughly corresponding to about a minute of content once voiced. The resulting story pieces are voiced with a voice of the appropriate gender using OpenAI’s text-to-speech (TTS) services. Videos are subtitled locally using Whisper and moviepy, and the resulting clips are combined and overlaid over royalty free Minecraft footage using moviepy. Note that no spell checking is applied to the prompt, which can have some interesting side effects.

The end result can be seen in the video above. Note that the video has some imperfections, but the quality of the video is generally on par with similar videos present in apps like TikTok and on sites like YouTube Shorts.