Gemini 3 Flash: Official Model Guide

What is Gemini 3 Flash?

Gemini 3 Flash is the speed tier in Google's Gemini 3 lineup. The official Gemini API model list positions it as the most intelligent model built for speed, with strong multimodal understanding and support for structured outputs, tool use, and long context. It targets production workloads where latency, cost control, and consistent quality matter more than maximum reasoning depth.

Flash is a preview model, which means Google can refine behavior, pricing, and limits over time. When deploying at scale, plan for monitoring, regression tests, and a migration path between preview and stable releases.

Official model ID and limits

The Gemini API model catalog lists Gemini 3 Flash under the model code gemini-3-flash-preview. The official limits include a 1,048,576 input token window and a 65,536 output token cap. It accepts text, image, video, audio, and PDF inputs and returns text output.

Parameter	Official value
Model code	gemini-3-flash-preview
Input token limit	1,048,576
Output token limit	65,536
Inputs	Text, image, video, audio, PDF
Output	Text
Knowledge cutoff	January 2025
Latest update	December 2025

Capability snapshot

Gemini 3 Flash is a tool friendly model with a wide range of supported capabilities. The official model listing confirms support for batch processing, caching, code execution, computer use, file search, function calling, search grounding, structured outputs, thinking, and URL context. It does not support audio generation, image generation, the Live API, or Google Maps grounding. When you need those features, you should route to other Gemini models that explicitly support them.

Supported

Batch processing and caching
Function calling and structured outputs
Search grounding and URL context
Tool use with code execution
Computer use workflows
Long context reasoning (up to 1M tokens)

Not supported

Audio generation
Image generation
Live API streaming mode
Google Maps grounding

Release notes and improvements

Google's release notes for Gemini 3 Flash emphasize improvements to multimodal understanding and visual reasoning. The update highlights stronger spatial reasoning, improved agentic coding performance, support for multimodal function responses, and code execution for image inputs. If your workflows rely on interpreting screenshots, diagrams, or complex visual instructions, Flash is positioned as the fastest Gemini 3 option with these upgrades.

For teams that run tool heavy workflows, multimodal function responses are especially valuable. They let the model select tools based on both text and image context, which reduces brittle prompt glue and keeps complex automation pipelines stable over time.

Pricing overview

Gemini 3 Flash pricing is published on the official Gemini API pricing page. Standard pricing lists $0.50 per 1M input tokens for text, image, and video inputs, and $1.00 per 1M input tokens for audio. Output tokens are priced at $3.00 per 1M tokens. Batch processing pricing cuts these rates in half. Caching is available with separate rates for cached input tokens.

Pricing changes are common during preview. Treat these figures as a planning baseline and confirm the latest rates before shipping production workloads. If you are optimizing cost, batch processing and caching can materially reduce spend for repeated prompts.

Plan	Input (text/image/video)	Input (audio)	Output
Standard	$0.50 / 1M	$1.00 / 1M	$3.00 / 1M
Batch	$0.25 / 1M	$0.50 / 1M	$1.50 / 1M

Cached input rates	Standard	Batch
Text, image, video	$0.05 / 1M	$0.025 / 1M
Audio	$0.25 / 1M	$0.125 / 1M

Where Gemini 3 Flash fits

Gemini 3 Flash sits between the most capable Gemini 3 Pro tier and the most cost efficient Gemini 3.1 Flash Lite tier. If you need a fast response model with strong multimodal understanding, Flash is usually the best default. If you need maximum reasoning depth or advanced agentic workflows, Gemini 3.1 Pro is the safer choice. If your top priority is cost per request and you can accept lighter reasoning, Gemini 3.1 Flash Lite is the better fit.

Model	Positioning	Best fit
Gemini 3.1 Pro	Highest capability Gemini 3 tier	Deep reasoning, complex agentic workflows
Gemini 3 Flash	Speed focused Gemini 3 model	Fast multimodal analysis and tool use
Gemini 3.1 Flash Lite	Most cost efficient option	High volume, latency sensitive workloads

Best fit use cases

Gemini 3 Flash is a production ready model for teams that want strong output quality with low latency. The following workloads are common:

Multimodal Q and A over product screenshots, documentation, and videos
Tool based assistants that combine search grounding with structured JSON output
Fast summaries for long documents, transcripts, or support conversations
Agentic coding workflows that need code execution and file search
Large batch processing pipelines that need predictable throughput

Prompting guidance

Flash responds best to explicit constraints and structured instructions. When using function calling or structured outputs, define schemas clearly and include validation hints. For multimodal prompts, add a short instruction describing how the model should interpret each input so the model does not guess your intent.

System: You are a product analyst. Use the image to extract UI issues.
User: Review the attached screenshot and return JSON with fields: issue, severity, evidence, recommendation.

User: Summarize this 60 minute call transcript into:
- 5 key risks
- 5 action items with owners
- 3 product insights with evidence

Multimodal input tips

Gemini 3 Flash can accept images, audio, and video along with text. For best results, keep inputs focused and provide context about what matters. For example, describe which frame of a video contains the key moment, or instruct the model to focus on text inside a chart rather than the background of the slide. If you pass multiple images, label them clearly.

Use short captions for each image to avoid ambiguity.
Provide timestamps for audio or video references.
Include expected output format before the content.
Use structured outputs to enforce predictable results.

When not to use Gemini 3 Flash

Flash does not generate images or audio. If your workflow requires creative image synthesis, audio creation, or low latency live streaming interactions, you should use a model that explicitly supports those features. The same applies to Google Maps grounding, which is not available in Flash.

FAQ

Is Gemini 3 Flash available in Google AI Studio and Vertex AI?

The official model catalog lists Gemini 3 Flash as a Gemini API model and also indicates availability in Vertex AI. AI Studio is typically the fastest way to test, while Vertex AI is best for enterprise controls.

Does Gemini 3 Flash support function calling?

Yes. The official model listing confirms function calling, structured outputs, and search grounding are supported features.

Can Gemini 3 Flash handle long documents?

Flash supports an input window of up to 1,048,576 tokens, making it suitable for long documents, transcripts, or knowledge base context.

Is Gemini 3 Flash good for tool based agents?

Yes. Support for code execution, file search, and computer use makes it a strong default for agents that need to call tools or inspect files.

Does Gemini 3 Flash generate images?

No. The model does not support image generation. Use a dedicated image model when you need generative visuals.

How should I pick between Flash and Flash Lite?

Flash is the balance of speed and quality. Flash Lite is the lowest cost option, while Flash provides stronger multimodal reasoning and broader tool coverage.