Gemini 2.5 Models now support implicit caching

MAY 8, 2025

Logan Kilpatrick Group Product Manager

We pioneered context caching in May of 2024, helping developers save 75% on repetitive context passed to our models with explicit caching. Today, we are rolling out the highly requested feature in the Gemini API: implicit caching.

Implicit caching with Gemini API

Implicit caching directly passes cache cost savings to developers without the need to create an explicit cache. Now, when you send a request to one of the Gemini 2.5 models, if the request shares a common prefix as one of previous requests, then it’s eligible for a cache hit. We will dynamically pass cost savings back to you, providing the same 75% token discount.

In order to increase the chance that your request contains a cache hit, you should keep the content at the beginning of the request the same and add things like a user's question or other additional context that might change from request to request at the end of the prompt. You can read more best practices on using implicit caching in the Gemini API docs.

To make more requests eligible for cache hits, we reduced the minimum request size for 2.5 Flash to 1024 tokens and 2.5 Pro to 2048 tokens.

Understanding token discounts with Gemini 2.5

In cases where you want to guarantee cost savings, you can still use our explicit caching API, which supports our Gemini 2.5 and 2.0 models. If you are using Gemini 2.5 models right now, you will start to see cached_content_token_count in the usage metadata which indicates how many tokens in the request were cached and therefore will be charged at the lower price.