Farewell Photoshop?

Introduction to Multimodal Output Having true multimodal output opens up interesting new possibilities in chatbots. For example, Gemini 2.0 Flash can play interactive graphical games or generate stories with consistent illustrations, maintaining character and setting continuity throughout multiple images. It's far from perfect, but character consistency is a new capability in AI assistants.

What is Multimodal Output? Multimodal output refers to the ability of a system to generate multiple forms of media, such as text, images, audio, and video. Gemini 2.0 Flash is a notable example of a multimodal output system, as it can generate interactive graphical games and stories with consistent illustrations.

Examples of Multimodal Output We tried out Gemini 2.0 Flash and it was pretty wild—especially when it generated a view of a photo we provided from another angle. The system can also create multi-image stories, as shown in the examples below.

<a data-pswp-width="1121" data-pswp-height="1152" data-pswp-srcset="https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_1.jpg 1121w, https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_1-640x658.jpg 640w, https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_1-1024x1052.jpg 1024w, https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_1-768x789.jpg 768w, https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_1-980x1007.jpg 980w" data-cropped="true" href="https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_1.jpg" target="_blank" class="cursor-zoom-in"> Creating a multi-image story with Gemini 2.0 Flash, part 1. Google / Benj Edwards

Creating a multi-image story with Gemini 2.0 Flash, part 1. Google / Benj Edwards

<a data-pswp-width="1099" data-pswp-height="1150" data-pswp-srcset="https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_2.jpg 1099w, https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_2-640x670.jpg 640w, https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_2-1024x1072.jpg 1024w, https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_2-768x804.jpg 768w, https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_2-980x1025.jpg 980w" data-cropped="true" href="https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_2.jpg" target="_blank" class="cursor-zoom-in"> <img decoding="async" width="1024" height="1072" src="https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_2-1024x1072.jpg" class="ars-gallery-image" alt="Creating a multi-image story with Gemini 2.0 Flash, part 2. Notice the alternative angle of the original photo." loading="lazy" aria-labelledby="caption-2082888" srcset="https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_2-1024x1072.jpg 1024w, https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_2-640x670.jpg 640w, https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_2-768x804.jpg 768w, https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_2-980x1025.jpg 980w, https://cdn.arstechnica.net/wp-content/uploads/2025/03/benj_computer_caper_2.jpg 1099w" sizes="auto, (max-width: 1024px) 100vw, 1024px"/> Creating a multi-image story with Gemini 2.0 Flash, part 2. Notice the alternative angle of the original photo. Google / Benj Edwards