Towards Global Understanding – Advancing Multilingual AI with Gemma 2 and a $150K Challenge

OCT 03, 2024
Robert Dadashi Research Scientist Google DeepMind
Glenn Cameron Product Marketing Manager AI Developer

At Google, we believe AI can bridge communication gaps across our diverse world. With over 7,000 languages and countless cultural nuances, the potential for fostering global understanding through AI is immense. We're excited to share steps towards this goal, focusing on helping empower communities to build AI that reflects the richness of human languages.

One way we're doing this is through Gemma, our family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Since its launch less than eight months ago, a vibrant community – we call it the Gemmaverse – has sprung up around Gemma, creating an incredible ecosystem of tools and tens of thousands of fine-tuned model variants.


Introducing a powerful, accessible multilingual model

Building on that momentum, today at Gemma Developer Day in Tokyo we unveiled a new 2 billion parameter Gemma 2 variant fine-tuned for Japanese. We're releasing this model, along with training materials, as practical examples and learning resources for developers worldwide. Our goal is to empower communities to adapt Gemma to their own languages, using their deep understanding of their languages and cultures.

Initial evaluations show the model performs Japanese-language tasks comparable to GPT 3.5, which was considered a frontier model not so long ago, while remaining lightweight enough to run efficiently on mobile devices. The model achieves this enhanced Japanese proficiency without sacrificing its robust English language capabilities, highlighting the potential for creating truly balanced multilingual models that can bridge communication gaps and serve diverse communities worldwide.

Gemma 2 2B JPN running offline on an Android phone via MediaPipe LLM Inference API

Starting today, you can download Gemma 2’s model weights from Kaggle or Hugging Face.


Building on a thriving community

Beyond our own efforts, the Gemmaverse is rapidly expanding, with developers achieving remarkable results in adapting the model for a wide range of languages and tackling regionally specific challenges. We've been particularly inspired by projects like Navarasa, where Indian developers fine-tuned Gemma for 12 Indic languages, demonstrating the community's ability to adapt the model for global linguistic needs.

Link to Youtube Video (visible only when JS is disabled)

We're also witnessing inspiring efforts to support more languages around the world. Developers have already published fine-tuned Gemma models for languages like Arabic, Vietnamese, Zulu, and many others, demonstrating the potential of this technology to bridge communication gaps and empower global communities. It’s particularly inspiring to see the community tackling challenges unique to specific regions, like preserving endangered dialects, as demonstrated by a developer in Korea building a translator for the Jeju Island dialect.


Unlocking global communication through collaboration

These community-driven initiatives highlight the importance of empowering local experts to build truly global AI. To further support this collaborative effort, we're launching the Unlocking Global Communication with Gemma competition with $150,000 in prizes on Kaggle. This competition invites developers worldwide to fine-tune Gemma 2 for their languages and share their knowledge through reproducible notebooks, exploring applications like language fluency, literary traditions, historical texts, and more.


Join the movement

Join us on Kaggle, share your knowledge, and help us build a future where AI transcends language barriers and empowers everyone, regardless of location. Together, let's unlock the full potential of language AI and create a more connected and understanding world.