Introduction to Gemini Robotics
Imagine that you want a robot to sort a pile of laundry into whites and colors. Gemini Robotics-ER 1.5 would process the request along with images of the physical environment (a pile of clothing). This AI can also call tools like Google search to gather more data. The ER model then generates natural language instructions, specific steps that the robot should follow to complete the given task.
How Gemini Robotics Works
The two new models work together to "think" about how to complete a task.
Credit: Google
Action Model
Gemini Robotics 1.5 (the action model) takes these instructions from the ER model and generates robot actions while using visual input to guide its movements. But it also goes through its own thinking process to consider how to approach each step. "There are all these kinds of intuitive thoughts that help [a person] guide this task, but robots don’t have this intuition," said DeepMind’s Kanishka Rao. "One of the major advancements that we’ve made with 1.5 in the VLA is its ability to think before it acts."
Advancements and Testing
Both of DeepMind’s new robotic AIs are built on the Gemini foundation models but have been fine-tuned with data that adapts them to operating in a physical space. This approach, the team says, gives robots the ability to undertake more complex multi-stage tasks, bringing agentic capabilities to robotics. The DeepMind team tests Gemini robotics with a few different machines, like the two-armed Aloha 2 and the humanoid Apollo. In the past, AI researchers had to create customized models for each robot, but that’s no longer necessary. DeepMind says that Gemini Robotics 1.5 can learn across different embodiments, transferring skills learned from Aloha 2’s grippers to the more intricate hands on Apollo with no specialized tuning.
Availability and Future
All this talk of physical agents powered by AI is fun, but we’re still a long way from a robot you can order to do your laundry. Gemini Robotics 1.5, the model that actually controls robots, is still only available to trusted testers. However, the thinking ER model is now rolling out in Google AI Studio, allowing developers to generate robotic instructions for their own physically embodied robotic experiments.
Conclusion
Gemini Robotics is a significant step forward in the development of AI-powered robots. With its ability to think and act, it has the potential to revolutionize the way we interact with robots and the tasks they can perform. While we are still far from having robots that can do our laundry, the advancements made by DeepMind are exciting and hold a lot of promise for the future.
FAQs
- Q: What is Gemini Robotics?
 A: Gemini Robotics is an AI system developed by DeepMind that enables robots to think and act in a physical space.
- Q: How does Gemini Robotics work?
 A: Gemini Robotics uses two models, the ER model and the action model, to generate instructions and actions for the robot to follow.
- Q: What are the potential applications of Gemini Robotics?
 A: The potential applications of Gemini Robotics are vast, including tasks such as sorting laundry, and other complex multi-stage tasks.
- Q: Is Gemini Robotics available to the public?
 A: Currently, Gemini Robotics 1.5 is only available to trusted testers, but the ER model is available in Google AI Studio for developers to generate robotic instructions.
 
			 
			 
					








