Odyssey's AI Turns Videos Into Interactive Worlds

Introduction to Interactive Video

London-based AI lab Odyssey has launched a research preview of a model that transforms video into interactive worlds. Initially focusing on world models for film and game production, the Odyssey team has stumbled upon a potentially completely new entertainment medium. The interactive video generated by Odyssey’s AI model responds to inputs in real-time, allowing users to interact with it using their keyboard, phone, controller, or eventually even voice commands.

How it Works

The underlying AI can generate realistic-looking video frames every 40 milliseconds, creating the illusion that users are actually influencing the digital world. The experience is described as feeling like exploring a "glitchy dream—raw, unstable, but undeniably new." The visuals are not yet polished, but the potential for this technology is vast.

Not Your Standard Video Tech

What makes this AI-generated interactive video tech different from traditional video games or CGI is the use of a "world model." Unlike traditional video models that generate entire clips in one go, world models work frame-by-frame to predict what should come next based on the current state and any user inputs. This is similar to how large language models predict the next word in a sequence, but infinitely more complex because it involves high-resolution video frames rather than words.

The Power of World Models

A world model is, at its core, an action-conditioned dynamics model. Each time a user interacts, the model takes the current state, the user’s action, and the history of what’s happened, then generates the next video frame accordingly. The result is something that feels more organic and unpredictable than a traditional game. There’s no pre-programmed logic saying "if a player does X, then Y happens"—instead, the AI is making its best guess at what should happen next based on what it’s learned from watching countless videos.

Overcoming Historic Challenges

Building something like this isn’t easy. One of the biggest hurdles with AI-generated interactive video is keeping it stable over time. When generating each frame based on previous ones, small errors can compound quickly, a phenomenon AI researchers call "drift." To tackle this, Odyssey has used a "narrow distribution model," pre-training their AI on general video footage, then fine-tuning it on a smaller set of environments. This trade-off means less variety but better stability.

The Future of Interactive Video

The company is already making "fast progress" on their next-gen model, which shows "a richer range of pixels, dynamics, and actions." Running all this fancy AI tech in real-time isn’t cheap, currently costing between £0.80-£1.60 per user-hour. However, Odyssey expects these costs to tumble further as models become more efficient.

Interactive Video: The Next Storytelling Medium?

Throughout history, new technologies have given birth to new forms of storytelling. Odyssey believes AI-generated interactive video is the next step in this evolution. If they’re right, we might be looking at the prototype of something that will transform entertainment, education, advertising, and more. Imagine training videos where you can practice the skills being taught, or travel experiences where you can explore destinations from your sofa.

Conclusion

The research preview available now is just a small step towards this vision, more of a proof of concept than a finished product. However, it’s an intriguing glimpse at what might be possible when AI-generated worlds become interactive playgrounds rather than just passive experiences. As technology advances, we can expect to see more sophisticated and engaging interactive videos that change the way we interact with digital content.

FAQs

What is AI-generated interactive video? AI-generated interactive video is a technology that uses artificial intelligence to generate video frames in real-time, allowing users to interact with the video using various inputs.
How does it work? The technology uses a "world model" to predict what should happen next in the video based on the current state and user inputs.
What are the potential applications? The potential applications of AI-generated interactive video include entertainment, education, advertising, and more.
Is it available now? A research preview of the technology is available now, but it’s still in the early stages of development.
How much does it cost? Currently, the cost of running the technology in real-time is between £0.80-£1.60 per user-hour, but this is expected to decrease as the technology becomes more efficient.