Google's AI Model Can Answer Questions About Images

Introduction to AI Mode

Google started incorporating AI features into its search engine in 2024. However, last month marked a significant escalation with the release of AI Mode. This new feature previews a future where searching the web won’t just return a list of links. According to Google, users are responding positively to AI Mode, prompting the company to enhance its capabilities by introducing multimodal functionality to its robotic results.

How AI Mode Works

AI Mode utilizes a custom version of the Gemini large language model (LLM) to generate search results. This model has been updated to support multimodal input, allowing users to upload images as part of their search query. The search bar in AI Mode will now feature a new button that enables users to snap a photo or upload an image directly.

Multimodal Input and Google Lens

The updated Gemini model can interpret the content of images, but it also receives assistance from Google Lens. Google Lens can identify specific objects within the uploaded images, providing context that AI Mode can use to make multiple sub-queries, known as a "fan-out technique." This collaborative approach between Gemini and Google Lens enhances the accuracy and relevance of search results.

Example Use Case

To illustrate the potential of this feature, consider a scenario where a user shows AI Mode a few books and asks about similar titles. Google Lens identifies each book, allowing AI Mode to incorporate the specifics into its response. This enables the model to suggest similar books and make recommendations based on the user’s follow-up questions. This example demonstrates how AI Mode can provide more personalized and relevant search results by leveraging multimodal input.

Conclusion

The introduction of multimodal functionality to AI Mode represents a significant step forward in search technology. By allowing users to upload images as part of their search queries, Google is moving closer to a future where search results are more interactive and personalized. As AI Mode continues to evolve, it will be interesting to see how users adapt to these changes and how they impact the way we search for information online.

FAQs

What is AI Mode?
AI Mode is a feature introduced by Google that uses a custom version of the Gemini large language model to produce search results, moving away from the traditional list of links.
How does multimodal input work in AI Mode?
Multimodal input allows users to upload images as part of their search query. The Gemini model interprets these images with the help of Google Lens, which identifies specific objects to provide more accurate and relevant results.
What is the role of Google Lens in AI Mode?
Google Lens assists the Gemini model by identifying objects in the images uploaded by users, allowing AI Mode to make multiple sub-queries and provide more personalized search results.
How will multimodal input change the way we search?
Multimodal input is expected to make search results more interactive and personalized. It allows for more complex and nuanced queries, potentially leading to more accurate and relevant results.