Gemma Family Expands with Models Tailored for Developers and Researchers

APRIL 9, 2024

Tris Warkentin Director, Product Management

Jane Fine Senior Product Manager Labs

In February we announced Gemma, our family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. The community's incredible response – including impressive fine-tuned variants, Kaggle notebooks, integration into tools and services, recipes for RAG using databases like MongoDB, and lots more – has been truly inspiring.

Today, we're excited to announce our first round of additions to the Gemma family, expanding the possibilities for ML developers to innovate responsibly: CodeGemma for code completion and generation tasks as well as instruction following, and RecurrentGemma, an efficiency-optimized architecture for research experimentation. Plus, we're sharing some updates to Gemma and our terms aimed at improvements based on invaluable feedback we've heard from the community and our partners.

Introducing the first two Gemma variants

CodeGemma: Code completion, generation, and chat for developers and businesses

Harnessing the foundation of our Gemma models, CodeGemma brings powerful yet lightweight coding capabilities to the community. CodeGemma models are available as a 7B pretrained variant that specializes in code completion and code generation tasks, a 7B instruction-tuned variant for code chat and instruction-following, and a 2B pretrained variant for fast code completion that fits on your local computer. CodeGemma models have several advantages:

Intelligent code completion and generation: Complete lines, functions, and even generate entire blocks of code – whether you're working locally or leveraging cloud resources.

Enhanced accuracy: Trained on 500 billion tokens of primarily English language data from web documents, mathematics, and code, CodeGemma models generate code that's not only more syntactically correct but also semantically meaningful, helping reduce errors and debugging time.

Multi-language proficiency: Your invaluable coding assistant for Python, JavaScript, Java, and other popular languages.

Streamlined workflows: Integrate a CodeGemma model into your development environment to write less boilerplate, and focus on interesting and differentiated code that matters – faster.

CodeGemma integrated within an existing AI dev project with

This table compares the performance of CodeGemma with other similar models on both single and multi-line code completion tasks.

Learn more about CodeGemma in our report or try it in this quickstart guide.

RecurrentGemma: Efficient, faster inference at higher batch sizes for researchers

RecurrentGemma is a technically distinct model that leverages recurrent neural networks and local attention to improve memory efficiency. While achieving similar benchmark score performance to the Gemma 2B model, RecurrentGemma's unique architecture results in several advantages:

Reduced memory usage: Lower memory requirements allow for the generation of longer samples on devices with limited memory, such as single GPUs or CPUs.

Higher throughput: Because of its reduced memory usage, RecurrentGemma can perform inference at significantly higher batch sizes, thus generating substantially more tokens per second (especially when generating long sequences).

Research innovation: RecurrentGemma showcases a non-transformer model that achieves high performance, highlighting advancements in deep learning research.

Graph showing maximum thoughput when sampling from a prompt of 2k tokens on TPUv5e

This chart reveals how RecurrentGemma maintains its sampling speed regardless of sequence length, while Transformer-based models like Gemma slow down as sequences get longer.

To understand the underlying technology, check out our paper. For practical exploration, try the notebook, which demonstrates how to fine-tune the model.

Built upon Gemma foundations, expanding capabilities

Guided by the same principles of the original Gemma models, the new model variants offer:

Open availability: Encourages innovation and collaboration with its availability to everyone and flexible terms of use.

High-performance and efficient capabilities: Advances the capabilities of open models with code-specific domain expertise and optimized design for exceptionally fast completion and generation.

Responsible design: Our commitment to responsible AI helps ensure the models deliver safe and reliable results.

Flexibility for diverse software and hardware:

- Both CodeGemma and RecurrentGemma: Built with JAX and compatible with JAX, PyTorch, , Hugging Face Transformers, and Gemma.cpp. Enable local experimentation and cost-effective deployment across various hardware, including laptops, desktops, NVIDIA GPUs, and Google Cloud TPUs.

- CodeGemma: Additionally compatible with Keras, NVIDIA NeMo, TensorRT-LLM, Optimum-NVIDIA, MediaPipe, and availability on Vertex AI.

- RecurrentGemma: Support for all the aforementioned products will be available in the coming weeks.

Gemma 1.1 update

Alongside the new model variants, we're releasing Gemma 1.1, which includes performance improvements. Additionally, we've listened to developer feedback, fixed bugs, and updated our terms to provide more flexibility.

Get started today

These first Gemma model variants are available in various places worldwide, starting today on Kaggle, Hugging Face, and Vertex AI Model Garden. Here's how to get started:

Access the models: Visit the Gemma website, Vertex AI Model Garden, Hugging Face, NVIDIA NIM APIs, or Kaggle for download instructions.

Explore integration options: Find guides and resources for integrating the models with your favorite tools and platforms.

Experiment and innovate: Add a Gemma model variant to your next project and explore its capabilities.

We invite you to try the CodeGemma and RecurrentGemma models and share your feedback on Kaggle. Together, let's shape the future of AI-powered content creation and understanding.