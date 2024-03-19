In February we announced Gemma, our family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. The community's incredible response – including impressive fine-tuned variants, Kaggle notebooks, integration into tools and services, recipes for RAG using databases like MongoDB, and lots more – has been truly inspiring. Today, we're excited to announce our first round of additions to the Gemma family, expanding the possibilities for ML developers to innovate responsibly: CodeGemma for code completion and generation tasks as well as instruction following, and RecurrentGemma, an efficiency-optimized architecture for research experimentation. Plus, we're sharing some updates to Gemma and our terms aimed at improvements based on invaluable feedback we've heard from the community and our partners.

Introducing the first two Gemma variants CodeGemma: Code completion, generation, and chat for developers and businesses Harnessing the foundation of our Gemma models, CodeGemma brings powerful yet lightweight coding capabilities to the community. CodeGemma models are available as a 7B pretrained variant that specializes in code completion and code generation tasks, a 7B instruction-tuned variant for code chat and instruction-following, and a 2B pretrained variant for fast code completion that fits on your local computer. CodeGemma models have several advantages: Intelligent code completion and generation: Complete lines, functions, and even generate entire blocks of code – whether you're working locally or leveraging cloud resources. Enhanced accuracy: Trained on 500 billion tokens of primarily English language data from web documents, mathematics, and code, CodeGemma models generate code that's not only more syntactically correct but also semantically meaningful, helping reduce errors and debugging time. Multi-language proficiency: Your invaluable coding assistant for Python, JavaScript, Java, and other popular languages. Streamlined workflows: Integrate a CodeGemma model into your development environment to write less boilerplate, and focus on interesting and differentiated code that matters – faster.

This table compares the performance of CodeGemma with other similar models on both single and multi-line code completion tasks.

Learn more about CodeGemma in our report or try it in this quickstart guide.

RecurrentGemma: Efficient, faster inference at higher batch sizes for researchers RecurrentGemma is a technically distinct model that leverages recurrent neural networks and local attention to improve memory efficiency. While achieving similar benchmark score performance to the Gemma 2B model, RecurrentGemma's unique architecture results in several advantages: Reduced memory usage: Lower memory requirements allow for the generation of longer samples on devices with limited memory, such as single GPUs or CPUs. Higher throughput: Because of its reduced memory usage, RecurrentGemma can perform inference at significantly higher batch sizes, thus generating substantially more tokens per second (especially when generating long sequences). Research innovation: RecurrentGemma showcases a non-transformer model that achieves high performance, highlighting advancements in deep learning research.