Introduction to Gemini 2.5 Flash
Gemini 2.5 Flash is a new model that allows developers to set a token limit for thinking or simply disable thinking altogether. This feature provides more control over the model’s output and cost. Google has provided pricing per 1 million tokens at $0.15 for input, and output comes in two flavors. Without thinking, outputs are $0.60, but enabling thinking boosts it to $3.50.
Features of Gemini 2.5 Flash
The thinking budget option will allow developers to fine-tune the model to do what they want for an amount of money they’re willing to pay. According to Doshi, you can actually see the reasoning improvements in benchmarks as you add more token budget. Like 2.5 Pro, this model supports Dynamic Thinking, which can automatically adjust the amount of work that goes into generating an output based on the complexity of the input. The new Flash model goes further by allowing developers to control thinking.
Benchmark Results
2.5 Flash outputs get better as you add more reasoning tokens. This is evident from the benchmark results, which show a significant improvement in output quality with increased token budget.
Launch and Availability
Google is launching the model now to guide improvements in these dynamic features. "Part of the reason we’re putting the model out in preview is to get feedback from developers on where the model meets their expectations, where it under-thinks or over-thinks, so that we can continue to iterate on [dynamic thinking]," says Doshi. Don’t expect that kind of precise control for consumer Gemini products right now, though. Doshi notes that the main reason you’d want to toggle thinking or set a budget is to control costs and latency, which matters to developers.
Future Plans
With the rapid cadence of releases, a final release for Gemini 2.5 doesn’t seem that far off. Google still doesn’t have any specifics to share on that front, but with the new developer options and availability in the Gemini app, Doshi tells us the team hopes to move the 2.5 family to general availability soon. Creating a simpler Gemini app experience for consumers while still offering flexibility is the goal, says Doshi.
Conclusion
Gemini 2.5 Flash is a significant update that provides developers with more control over the model’s output and cost. With its dynamic thinking features and token budget options, it has the potential to revolutionize the way developers interact with AI models. As Google continues to iterate and improve the model, we can expect to see even more exciting developments in the future.
FAQs
Q: What is Gemini 2.5 Flash?
A: Gemini 2.5 Flash is a new model that allows developers to set a token limit for thinking or simply disable thinking altogether.
Q: How much does it cost to use Gemini 2.5 Flash?
A: The cost of using Gemini 2.5 Flash is $0.15 per 1 million tokens for input, and output comes in two flavors: $0.60 without thinking and $3.50 with thinking.
Q: What is Dynamic Thinking?
A: Dynamic Thinking is a feature that automatically adjusts the amount of work that goes into generating an output based on the complexity of the input.
Q: When will Gemini 2.5 be released?
A: Google hasn’t announced a specific release date for Gemini 2.5, but with the rapid cadence of releases, it’s expected to be soon.