Tech
Meta’s Movie Gen AI Video Generator Is Capable of Making Actual Movies, Music Included
Meta’s AI journey would inevitably take it into the budding realm of AI video. Now, the Mark Zuckerberg-led company has Movie Gen, yet another video generator capable of making some realistic-ish video from a short text prompt. Meta claims this is as useful for Hollywood as it is for the average Instagrammer, even though its not available to anyone outside Meta. Movie Gen can create audio, making it the most capable deep fake generator we’ve seen yet.
In a blog post, Meta showed off a few example videos, including a happy baby hippo swimming underwater, somehow floating just below the surface and apparently having no problems holding its breath. Other videos showcase penguins dressed in “Victorian” outfits with too-short sleeves and skirts to be representative of the time period. There’s another video a woman DJing next to a cheetah who is too distracted by the beat to care about her present danger.
Everybody’s getting in on the AI-generated video space. Already this year, Microsoft’s VASA-1 and OpenAI’s Sora promised “realistic” videos generated from simple text prompts. Despite being teased back in February, Sora has yet to see the light of day. Meta’s Movie Gen offers a few more capabilities than the competition, including editing existing video with a text prompt, creating video based on an image, and adding AI-generated sound to the created video.
The video editing suite seems especially novel. It works on generated video as well as real-world captures. Meta claims its model “preserves the original content” while adding elements to the footage, whether they’re backdrops or outfits for the scene’s main characters. Meta showed how you can also take pictures of people and drop them into generated movies.
Meta already has music and sound generation models, but the social media giant displayed a few examples of the 13B parameter audio generator adding sound effects and soundtracks on top of videos. The text input could be as simple as “rustling leaves and snapping twigs” to add to the generated video of a snake winding along the forest floor. The audio generator is currently limited to 45 seconds, so it won’t score entire movies. At least, it won’t be just yet.
And no, sorry, you can’t use it yet. Meta’s chief product officer, Chris Cox, wrote on Threads, “We aren’t ready to release this as a product anytime soon—it’s still expensive, and generation time is too long.”
In its whitepaper discussing Movie Gen, Meta said the whole software suite is made up of multiple foundation models. The largest video model the company has is a 30B parameter transformer model with a maximum context length of 73,000 video tokens. The audio generator is a 13B parameter foundation model that can do both video-to-audio and text-to-audio.
It’s hard to compare that to the biggest AI companies’ video generators, especially since OpenAI claims Sora uses “data called patches, each of which is akin to a token in GPT.” Meta is one of the few major companies that still release data with its new AI tools, a practice that has fallen by the wayside as AI has become excessively commercialized. Despite that, Meta’s whitepaper doesn’t offer much of an idea of where it got its training data for Movie Gen. In all likelihood, some part of the data set has come from Facebook users’ videos. Meta also uses the photos you take with the Meta Ray-Ban smart glasses to train its AI models.
You can’t use Movie Gen yet. Instead, other AI movie generators like RunwayML’s Gen 3 offer a limited number of tokens to create small clips before you need to start paying. A report by 404 Media earlier this year indicated that Runway trained its AI from thousands of YouTube videos, and like most AI startups, it never asked permission before scraping that content.
Meta said it worked closely with filmmakers and video producers when creating this model and will continue doing so as it works on Movie Gen. Reports from earlier this year indicate studios are already cozying up to AI companies. Independent darling A24 has recently worked with VC firms specializing in AI, with some tied to OpenAI. On the flip side, Meta is reportedly in talks with Hollywood stars like Judi Dench and Awkwafina about using their voices for future AI projects.