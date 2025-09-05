At Google I/O, we previewed Gemma 3n with text and image inputs and launched the Google AI Edge Gallery app on GitHub. This open-source, interactive playground app is designed to inspire and enable developers by providing practical examples, transparent performance metrics, and direct links to the documentation you need to start building experience powered by on-device AI models. The developer community's response has been fantastic, reaching 500,000 APK downloads in just two months, demonstrating the community's excitement for powerful, private, on-device generative AI.

Today, we’re thrilled to take two big steps forward: adding audio modality to the Google AI Edge stack and bringing the Google AI Edge Gallery to the Google Play Store.



New audio capabilities with Gemma 3n

Beyond text and vision, the Google AI Edge stack now supports audio. Our first model with this capability is Gemma 3n, accessible through the MediaPipe LLM Inference API for Android and for Web. Audio understanding unlocks powerful new on-device features, including:

High-Quality Speech-to-Text: Transcribe audio to text from a variety of spoken languages.

Speech-to-Translated-Text: Translate spoken audio into text in another language.



In this initial release, the MediaPipe LLM Inference API supports audio batch inference for clips up to 30 seconds long. Streaming audio support is next on our roadmap.