Google Makes Gemini 2.5 Models Generally Available

Google Makes Gemini 2.5 Models Generally Available
Source: Google

Everyone can now use the new Gemini 2.5 models.

Google has today launched Gemini 2.5 Flash-Lite, an extra efficient version of its existing AI models aimed at fast and cost-effective processing of prompts. Additionally, Gemini 2.5 Pro and Flash are also generally available.

Faster and Cheaper

Gemini 2.5 was officially introduced in March, but the models were only available in preview. Now they have been generally rolled out. They are based on a “mixture-of-experts” architecture, meaning they each have multiple neural networks. When a user enters a prompt, only one of these networks is activated.

The Flash-Lite model processes prompts even faster than Flash. Google states that it is designed for applications such as translations and classifications, where low latency is required. “2.5 Flash Lite generally has higher quality than 2.0 Flash-Lite in terms of coding, mathematics, science, reasoning, and multimodal benchmarks.” Flash-Lite costs $0.10 per million input tokens, which is ten times less than the most powerful Pro model.

All 2.5 models are multimodal and support up to a million tokens per prompt. They run on Google’s own TPUv5p AI chips. The price of the middle Flash model is increased: input tokens now cost $0.30 per million tokens instead of the previous price of $0.15. The separate price for the thinking mode disappears. Through the thinking mode, the model achieves a higher output quality because it spends more time considering its response.