What is Gemini 3 Flash?
Gemini 3 Flash is the speed tier in Google's Gemini 3 lineup. The official Gemini API model list positions it as the most intelligent model built for speed, with strong multimodal understanding and support for structured outputs, tool use, and long context. It targets production workloads where latency, cost control, and consistent quality matter more than maximum reasoning depth.
Flash is a preview model, which means Google can refine behavior, pricing, and limits over time. When deploying at scale, plan for monitoring, regression tests, and a migration path between preview and stable releases.
Official model ID and limits
The Gemini API model catalog lists Gemini 3 Flash under the model code gemini-3-flash-preview. The official limits include a 1,048,576 input token window and a 65,536 output token cap. It accepts text, image, video, audio, and PDF inputs and returns text output.
| Parameter | Official value |
|---|---|
| Model code | gemini-3-flash-preview |
| Input token limit | 1,048,576 |
| Output token limit | 65,536 |
| Inputs | Text, image, video, audio, PDF |
| Output | Text |
| Knowledge cutoff | January 2025 |
| Latest update | December 2025 |
Capability snapshot
Gemini 3 Flash is a tool friendly model with a wide range of supported capabilities. The official model listing confirms support for batch processing, caching, code execution, computer use, file search, function calling, search grounding, structured outputs, thinking, and URL context. It does not support audio generation, image generation, the Live API, or Google Maps grounding. When you need those features, you should route to other Gemini models that explicitly support them.
Supported
- Batch processing and caching
- Function calling and structured outputs
- Search grounding and URL context
- Tool use with code execution
- Computer use workflows
- Long context reasoning (up to 1M tokens)
Not supported
- Audio generation
- Image generation
- Live API streaming mode
- Google Maps grounding
Release notes and improvements
Google's release notes for Gemini 3 Flash emphasize improvements to multimodal understanding and visual reasoning. The update highlights stronger spatial reasoning, improved agentic coding performance, support for multimodal function responses, and code execution for image inputs. If your workflows rely on interpreting screenshots, diagrams, or complex visual instructions, Flash is positioned as the fastest Gemini 3 option with these upgrades.
For teams that run tool heavy workflows, multimodal function responses are especially valuable. They let the model select tools based on both text and image context, which reduces brittle prompt glue and keeps complex automation pipelines stable over time.
Pricing overview
Gemini 3 Flash pricing is published on the official Gemini API pricing page. Standard pricing lists $0.50 per 1M input tokens for text, image, and video inputs, and $1.00 per 1M input tokens for audio. Output tokens are priced at $3.00 per 1M tokens. Batch processing pricing cuts these rates in half. Caching is available with separate rates for cached input tokens.
Pricing changes are common during preview. Treat these figures as a planning baseline and confirm the latest rates before shipping production workloads. If you are optimizing cost, batch processing and caching can materially reduce spend for repeated prompts.
| Plan | Input (text/image/video) | Input (audio) | Output |
|---|---|---|---|
| Standard | $0.50 / 1M | $1.00 / 1M | $3.00 / 1M |
| Batch | $0.25 / 1M | $0.50 / 1M | $1.50 / 1M |
| Cached input rates | Standard | Batch |
|---|---|---|
| Text, image, video | $0.05 / 1M | $0.025 / 1M |
| Audio | $0.25 / 1M | $0.125 / 1M |
Where Gemini 3 Flash fits
Gemini 3 Flash sits between the most capable Gemini 3 Pro tier and the most cost efficient Gemini 3.1 Flash Lite tier. If you need a fast response model with strong multimodal understanding, Flash is usually the best default. If you need maximum reasoning depth or advanced agentic workflows, Gemini 3.1 Pro is the safer choice. If your top priority is cost per request and you can accept lighter reasoning, Gemini 3.1 Flash Lite is the better fit.
| Model | Positioning | Best fit |
|---|---|---|
| Gemini 3.1 Pro | Highest capability Gemini 3 tier | Deep reasoning, complex agentic workflows |
| Gemini 3 Flash | Speed focused Gemini 3 model | Fast multimodal analysis and tool use |
| Gemini 3.1 Flash Lite | Most cost efficient option | High volume, latency sensitive workloads |
Best fit use cases
Gemini 3 Flash is a production ready model for teams that want strong output quality with low latency. The following workloads are common:
- Multimodal Q and A over product screenshots, documentation, and videos
- Tool based assistants that combine search grounding with structured JSON output
- Fast summaries for long documents, transcripts, or support conversations
- Agentic coding workflows that need code execution and file search
- Large batch processing pipelines that need predictable throughput
Prompting guidance
Flash responds best to explicit constraints and structured instructions. When using function calling or structured outputs, define schemas clearly and include validation hints. For multimodal prompts, add a short instruction describing how the model should interpret each input so the model does not guess your intent.
System: You are a product analyst. Use the image to extract UI issues. User: Review the attached screenshot and return JSON with fields: issue, severity, evidence, recommendation.
User: Summarize this 60 minute call transcript into: - 5 key risks - 5 action items with owners - 3 product insights with evidence
Multimodal input tips
Gemini 3 Flash can accept images, audio, and video along with text. For best results, keep inputs focused and provide context about what matters. For example, describe which frame of a video contains the key moment, or instruct the model to focus on text inside a chart rather than the background of the slide. If you pass multiple images, label them clearly.
- Use short captions for each image to avoid ambiguity.
- Provide timestamps for audio or video references.
- Include expected output format before the content.
- Use structured outputs to enforce predictable results.
When not to use Gemini 3 Flash
Flash does not generate images or audio. If your workflow requires creative image synthesis, audio creation, or low latency live streaming interactions, you should use a model that explicitly supports those features. The same applies to Google Maps grounding, which is not available in Flash.
FAQ
Is Gemini 3 Flash available in Google AI Studio and Vertex AI?
The official model catalog lists Gemini 3 Flash as a Gemini API model and also indicates availability in Vertex AI. AI Studio is typically the fastest way to test, while Vertex AI is best for enterprise controls.
Does Gemini 3 Flash support function calling?
Yes. The official model listing confirms function calling, structured outputs, and search grounding are supported features.
Can Gemini 3 Flash handle long documents?
Flash supports an input window of up to 1,048,576 tokens, making it suitable for long documents, transcripts, or knowledge base context.
Is Gemini 3 Flash good for tool based agents?
Yes. Support for code execution, file search, and computer use makes it a strong default for agents that need to call tools or inspect files.
Does Gemini 3 Flash generate images?
No. The model does not support image generation. Use a dedicated image model when you need generative visuals.
How should I pick between Flash and Flash Lite?
Flash is the balance of speed and quality. Flash Lite is the lowest cost option, while Flash provides stronger multimodal reasoning and broader tool coverage.