Introduction to Veo 3
On Tuesday, Google launched Veo 3, a new AI video synthesis model that can do something no major AI video generator has been able to do before: create a synchronized audio track. While from 2022 to 2024, we saw early steps in AI video generation, each video was silent and usually very short in duration. Now you can hear voices, dialog, and sound effects in eight-second high-definition video clips.
The Spaghetti Benchmark
Shortly after the new launch, people began asking the most obvious benchmarking question: How good is Veo 3 at faking Oscar-winning actor Will Smith eating spaghetti? The spaghetti benchmark in AI video traces its origins back to March 2023, when we first covered an early example of horrific AI-generated video using an open source video synthesis model called ModelScope. The spaghetti example later became well-known enough that Smith parodied it almost a year later in February 2024.
The Original Viral Video
One thing people forget is that at the time, the Smith example wasn’t the best AI video generator out there—a video synthesis model called Gen-2 from Runway had already achieved superior results (though it was not yet publicly accessible). But the ModelScope result was funny and weird enough to stick in people’s memories as an early poor example of video synthesis, handy for future comparisons as AI models progressed.
Veo 3 Put to the Test
AI app developer Javi Lopez first came to the rescue for curious spaghetti fans earlier this week with Veo 3, performing the Smith test and posting the results on X. But as you’ll notice when you watch, the soundtrack has a curious quality: The faux Smith appears to be crunching on the spaghetti.
The Result
On X, Javi Lopez ran "Will Smith eating spaghetti" in Google’s Veo 3 AI video generator and received this result. The video shows Will Smith eating spaghetti, but the sound effects are off, with the sound of crunching accompanying each bite.
Understanding the Glitch
It’s a glitch in Veo 3’s experimental ability to apply sound effects to video, likely because the training data used to create Google’s AI models featured many examples of chewing mouths with crunching sound effects. Generative AI models are pattern-matching prediction machines, and they need to be shown enough examples of various types of media to generate convincing new outputs. If a concept is over-represented or under-represented in the training data, you’ll see unusual generation results.
Conclusion
The launch of Veo 3 marks a significant step forward in AI video generation, with the ability to create synchronized audio tracks. While the model is still experimental and has its glitches, it shows promise for the future of AI-generated video content. The spaghetti benchmark will likely continue to be a popular test for AI video generators, and it will be interesting to see how Veo 3 and other models improve over time.
FAQs
- What is Veo 3?
Veo 3 is a new AI video synthesis model launched by Google that can create synchronized audio tracks. - What is the spaghetti benchmark?
The spaghetti benchmark is a test used to evaluate the quality of AI video generators, specifically their ability to generate a video of Will Smith eating spaghetti. - Why does the Veo 3 video of Will Smith eating spaghetti have a crunching sound effect?
The crunching sound effect is a glitch in Veo 3’s ability to apply sound effects to video, likely due to the training data used to create the model. - What is the future of AI video generation?
The future of AI video generation looks promising, with models like Veo 3 showing significant improvements in quality and ability. However, there are still glitches and limitations to be addressed.