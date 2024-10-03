Today, Gemini 1.5 Flash-8B, our latest Flash variant, is production-ready and comes with:

50% lower price (compared to 1.5 Flash)

2x higher rate limits (compared to 1.5 Flash)

Lower latency on small prompts (compared to 1.5 Flash)



Developers can access gemini-1.5-flash-8b for free via Google AI Studio and the Gemini API.



Our lightweight model, smaller and faster

At I/O, we announced Gemini 1.5 Flash, our lightweight model, optimized for speed and efficiency. Over the last few months, Google DeepMind has made considerable progress making 1.5 Flash even better based on developer feedback and testing the limits of what’s possible.

Last month, we released an experimental version of Gemini 1.5 Flash-8B, a smaller and faster variant of 1.5 Flash. We’re now excited to make it generally available for production-use. Flash-8B nearly matches the performance of the 1.5 Flash model launched in May across many benchmarks. It performs especially well on tasks such as chat, transcription, and long context language translation.

Our release of best in class small models continues to be informed by developer feedback and our own testing of what is possible with these models. We see the most potential for this model in tasks ranging from high volume multimodal use cases to long context summarization tasks.