Smaller, Safer, More Transparent: Advancing Responsible AI with Gemma

JULY 31, 2024

Neel Nanda Research Engineer

Tom Lieberum Research Engineer

Ludovic Peran Product Manager

Kathleen Kenealy Research Engineer

In June, we released Gemma 2, our new best-in-class open models, in 27 billion (27B) and 9 billion (9B) parameter sizes. Since its debut, the 27B model quickly became one of the highest-ranking open models on the LMSYS Chatbot Arena leaderboard, even outperforming popular models more than twice its size in real conversations.

But Gemma is about more than just performance. It's built on a foundation of responsible AI, prioritizing safety and accessibility. To support this commitment, we are excited to announce three new additions to the Gemma 2 family:

Gemma 2 2B – a brand-new version of our popular 2 billion (2B) parameter model, featuring built-in safety advancements and a powerful balance of performance and efficiency.

2. ShieldGemma – a suite of safety content classifier models, built upon Gemma 2, to filter the input and outputs of AI models and keep the user safe.

3. Gemma Scope – a new model interpretability tool that offers unparalleled insight into our models' inner workings.

With these additions, researchers and developers can now create safer customer experiences, gain unprecedented insights into our models, and confidently deploy powerful AI responsibly, right on device, unlocking new possibilities for innovation.

Gemma 2 2B: Experience Next-Gen Performance, Now On-Device

We're excited to introduce the Gemma 2 2B model, a highly anticipated addition to the Gemma 2 family. This lightweight model produces outsized results by learning from larger models through distillation. In fact, Gemma 2 2B surpasses all GPT-3.5 models on the Chatbot Arena, demonstrating its exceptional conversational AI abilities.

Graph - LYMSYS Chatbot Arena leaderboard scores

LMSYS Chatbot Arena leaderboard scores captured on July 30th, 2024. Gemma 2 2B score +/- 10.

Gemma 2 2B offers:

Exceptional performance: Delivers best-in-class performance for its size, outperforming other open models in its category.

Flexible and cost-effective deployment: Run Gemma 2 2B efficiently on a wide range of hardware—from edge devices and laptops to robust cloud deployments with Vertex AI and Google Kubernetes Engine (GKE). To further enhance its speed, it is optimized with the NVIDIA TensorRT-LLM library and is available as an NVIDIA NIM. This optimization targets various deployments, including data centers, cloud, local workstations, PCs, and edge devices — using NVIDIA RTX, NVIDIA GeForce RTX GPUs, or NVIDIA Jetson modules for edge AI. Additionally, Gemma 2 2B seamlessly integrates with Keras, JAX, Hugging Face, NVIDIA NeMo, Ollama, Gemma.cpp, and soon MediaPipe for streamlined development.

Open and accessible: Available under the commercially-friendly Gemma terms for research and commercial applications. It's even small enough to run on the free tier of T4 GPUs in Google Colab, making experimentation and development easier than ever.

Starting today, you can download Gemma 2’s model weights from Kaggle, Hugging Face, Vertex AI Model Garden. You can also try its capabilities in Google AI Studio.

ShieldGemma: Protecting Users with State-of-the-Art Safety Classifiers

Deploying open models responsibly to ensure engaging, safe, and inclusive AI outputs requires significant effort from developers and researchers. To help developers in this process, we're introducing ShieldGemma, a series of state-of-the-art safety classifiers designed to detect and mitigate harmful content in AI models inputs and outputs. ShieldGemma specifically targets four key areas of harm:

Hate speech

Harassment

Sexually explicit content

Dangerous content

Generative AI application model architecture

These open classifiers complement our existing suite of safety classifiers in the Responsible AI Toolkit, which includes a methodology to build classifiers tailored to a specific policy with limited number of datapoints, as well as existing Google Cloud off-the-shelf classifiers served via API.

Here's how ShieldGemma can help you create safer, better AI applications:

SOTA performance: Built on top of Gemma 2, ShieldGemma are the industry-leading safety classifiers.

Flexible sizes: ShieldGemma offers various model sizes to meet diverse needs. The 2B model is ideal for online classification tasks, while the 9B and 27B versions provide higher performance for offline applications where latency is less of a concern. All sizes leverage NVIDIA speed optimizations for efficient performance across hardware.

Open and collaborative: The open nature of ShieldGemma encourages transparency and collaboration within the AI community, contributing to the future of ML industry safety standards.

"As AI continues to mature, the entire industry will need to invest in developing high performance safety evaluators. We're glad to see Google making this investment, and look forward to their continued involvement in our AI Safety Working Group.” ~ Rebecca Weiss, Executive Director, ML Commons

Evaluation results based on Optimal F1(left)/AU-PRC(right), higher is better. We use 𝛼=0 And T = 1 for calculating the probabilities. ShieldGemma (SG) Prompt and SG Response are our test datasets and OpenAI Mod/ToxicChat are external benchmarks. The performance of baseline models on external datasets is sourced from Ghosh et al. (2024); Inan et al. (2023).

Learn more about ShieldGemma, see full results in the technical report, and start building safer AI applications with our comprehensive Responsible Generative AI Toolkit.

Gemma Scope: Illuminating AI Decision-Making with Open Sparse Autoencoders

Gemma Scope offers researchers and developers unprecedented transparency into the decision-making processes of our Gemma 2 models. Acting like a powerful microscope, Gemma Scope uses sparse autoencoders (SAEs) to zoom in on specific points within the model and make its inner workings more interpretable.

These SAEs are specialized neural networks that help us unpack the dense, complex information processed by Gemma 2, expanding it into a form that's easier to analyze and understand. By studying these expanded views, researchers can gain valuable insights into how Gemma 2 identifies patterns, processes information, and ultimately makes predictions. With Gemma Scope, we aim to help the AI research community discover how to build more understandable, accountable, and reliable AI systems.

Here's what makes Gemma Scope groundbreaking:

Open SAEs: Over 400 freely available SAEs covering all layers of Gemma 2 2B and 9B.

Interactive demos: Explore SAE features and analyze model behavior without writing code on Neuronpedia.

Easy-to-use repository: Code and examples for interfacing with SAEs and Gemma 2.

Learn more about Gemma Scope on the Google DeepMind blog, technical report, and developer documentation.

A Future Built on Responsible AI

These releases represent our ongoing commitment to providing the AI community with the tools and resources needed to build a future where AI benefits everyone. We believe that open access, transparency, and collaboration are essential for developing safe and beneficial AI.