Google launches Gemini 3.1 Flash-Lite, a faster and cheaper AI model

Google launches Gemini 3.1 Flash-Lite, a faster and cheaper AI model
Source: Google

According to Google, 3.1 Flash is designed with speed as its focus.

Google has announced Gemini 3.1 Flash-Lite, a new multimodal AI model variant focusing on speed and low costs for large-scale applications.

Cheaper than other Gemini models

According to a blog post from Google, Gemini 3.1 Flash-Lite is significantly cheaper than other models in the Gemini range. The model costs $0.25 per million input tokens and $1.50 per million output tokens. By comparison, Gemini 3.1 Pro, Google’s most powerful model, starts at $2 per million input tokens and $18 per million output tokens.

The model is also faster. In internal tests, Flash-Lite generated responses 45% faster than Gemini 2.5 Flash, while the time to the first output token is reportedly 2.5 times shorter.

Aimed at large-scale tasks

Gemini 3.1 Flash-Lite can process multimodal prompts up to 1 million tokens and generate responses up to 64,000 text tokens. The model can also generate code, for example, to build dashboards or other visual applications.

Google expects developers to use the model primarily for high-volume tasks with limited reasoning requirements. Examples include translating product catalogs or automatically moderating content on e-commerce platforms.

Benchmark results

In eleven benchmark tests, Flash-Lite achieved the highest score in six tests, beating GPT-5 mini and Claude 4.5 Haiku, among others.

The model achieved a good score on GPAQ Diamond, a benchmark with doctoral-level questions. On the heavy HLA benchmark, it scored 16%, compared to 44.4% for Gemini 3.1 Pro.

Gemini 3.1 Flash-Lite is currently available in preview via Vertex AI and Google AI Studio.