Google introduces automatic caching for Gemini models

After the criticism: Google introduces automatic caching to reduce usage costs of Gemini models

Google is introducing a new feature for its Gemini API that the company claims can result in cost savings of up to 75 percent for developers.

The feature, ‘implicit caching’, is enabled by default for the Gemini 2.5 Pro and 2.5 Flash models. Developers no longer need to manage the cache themselves, making usage simpler.

With implicit caching, Google aims to address the problem of high costs associated with frequent use of similar prompts.

While previous caching solutions still required explicit input from developers, the new approach works automatically. When an API request has the same initial structure as a previous request, a cache is used and costs are reduced.

According to Google, caching starts from 1,024 tokens for 2.5 Flash and 2,048 tokens for 2.5 Pro. This equates to approximately 750 and 1,500 words respectively. Developers are advised to place the repeated context at the beginning of a prompt. Variable information should be placed at the end to increase the chance of a cache hit.

Google brings AI research tool Deep Research to Workspace

Response to earlier criticism

The introduction of the feature follows criticism of the previous caching approach. It required developers to manually mark prompts for caching, which led to unexpectedly high costs and frustration. Google recently acknowledged the problems and promised improvements. The switch to implicit caching is a direct result of this.

Google currently does not guarantee that every cache hit will be correctly identified. There is also no external validation of the promised savings. The reliability of the new feature will therefore be evident from the feedback of the first users. You can read more information about this in a Google blog post.

Itdaily - After the criticism: Google introduces automatic caching to reduce usage costs of Gemini models

Google brings AI research tool Deep Research to Workspace

Response to earlier criticism