gemini-2.5-flash-preview-05-20

The model is divided into two versions. By default, it uses the “nothinking” version, but you can add the “-thinking” suffix to enable the model’s reasoning process.

Nothinking：gemini-2.5-flash-preview-05-20 or gemini-2.5-flash-preview-05-20-nothinking

Thinking：gemini-2.5-flash-preview-05-20-thinking

Introduction

In April 2025, Google introduced the early preview of Gemini 2.5 Flash (model code: gemini-2.5-flash-preview-05-20) via Google AI Studio and Vertex AI, as an upgraded, high-efficiency successor to Gemini 2.0 Flash. Designed for high-volume, real-time applications, this model blends low latency and cost with stronger reasoning, multimodal capabilities, and innovative “thinking budget” control. At Google I/O 2025, Gemini 2.5 Flash entered broader preview, signaling its readiness for wider production use.

Key Features

Hybrid Reasoning Architecture & “Thinking Budget” Control
- First Gemini model to fully enable hybrid inference.
- The “thinking budget” allows developers to control reasoning depth (0–24,576 tokens).
- Developers can enable or disable intensive reasoning per task, balancing quality, speed, and cost.
- Pre-inference (“pre-thinking”) decomposes complex tasks and verifies facts for accurate, logical outputs.
- Auto-adjustment optimizes resource use based on query complexity.
Advanced Multimodal Functionality
- Supports text, image, audio, and video as inputs (outputs: primarily text for now).
- Native audio output: Announced at I/O 2025; API-level control of tone, accent, and speaking style (e.g., storytelling).
- Emotion detection: Responds to user emotions and ignores background chatter for contextually-aware interactions.
Efficient Performance & Low Cost
- Sits at the “Pareto frontier,” excelling at cost–performance balance.
- Significant improvements in reasoning, multimodal tasks, code generation, and long-context processing.
- Reduces token usage by 20–30% vs. previous models.
- Supports up to 2 million tokens in context window, ideal for large documents or complex tasks.
Enhanced Security & Tools Integration
- Advanced protections against indirect prompt injection.
- Native tool invocations (Google Search, API calls, Python interpreter) for live data and code execution.
Canvas Feature Support
- Integrates Google Canvas interactivity for generating web pages, quizzes, infographics, and more, streamlining document/code workflow optimization.

Benchmark Performance

Gemini 2.5 Flash demonstrates robust benchmark scores (default sampling, single-pass):

Benchmark	Score/Performance
Humanity’s Last Exam (no tool use)	12.1%
GPQA Diamond Science	78.3%
AIME 2025 Math	78.0%
LMArena Hard Prompts	Second only to Gemini 2.5 Pro; near top-tier ability

These results show near top-model capability at small/efficient scale and high value for investment.

Real-World Applications

Customer Service: Real-time, accurate query handling and natural conversation.
Document Parsing & Summarization: Processes long/multi-document inputs for key info extraction and live summaries.
Virtual Assistants: Smart assistants handling voice, text, image-based commands.
Education: Canvas-generated interactive learning applications (e.g., quizzes, personalized YouTube-based lessons).
Developer Tools: Code conversion, frontend development, and complex programming via Google AI Studio and Vertex AI.

Technological Innovations & Roadmap

Hybrid architecture and controllable reasoning power give developers unparalleled flexibility.
Production-ready general availability planned for early June 2025.
Future directions include:
- Project Mariner: Enhanced agent/computer-use capabilities
- Deeper research: Synthesis of public/private (PDF, image) content; Gmail/Drive integration
- Over 140 languages for text/image input, 24 languages for audio outputs

Limitations and Considerations

Still in preview (as of May 20, 2025); detailed technical/security reports pending.
Output primarily in text; image/video output not yet available.
Some features (e.g., deep research tools) remain experimental.

Access & Quickstart

Available on:

Google AI Studio: For developers experimenting with thinking budget and multimodal input
Vertex AI: Enterprise-level deployment/customization
Gemini App: End-user experience including Canvas and multimodal input

Refer to Google’s developer documentation and the Gemini Cookbook for further guidance.

Conclusion

Gemini 2.5 Flash (gemini-2.5-flash-preview-05-20) is Google’s 2025 high-performance, cost-efficient, and developer-flexible AI foundation model, with hybrid reasoning, controllable performance, and deep multimodal abilities. For customer service, document analysis, education, and coding, it offers a compelling value proposition—poised to strengthen Google’s leadership in the competitive AI landscape as capabilities expand.

References: