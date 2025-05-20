Last year Google AI Edge introduced support for on-device small language models (SLMs) with four initial models on Android, iOS, and Web. Today, we are excited to expand support to over a dozen models including the new Gemma 3 and Gemma 3n models, hosted on our new LiteRT Hugging Face community. Gemma 3n, available via Google AI Edge as an early preview, is Gemma’s first multimodal on-device small language model supporting text, image, video, and audio inputs. Paired with our new Retrieval Augmented Generation (RAG) and Function Calling libraries, you have everything you need to prototype and build transformative AI features fully on the edge.

Broader model support You can find our growing list of models to choose from in the LiteRT Hugging Face Community. Download any of these models and easily run them on-device with just a few lines of code. The models are fully optimized and converted for mobile and web. Full instructions on how to run these models can be found in our documentation and on each model card on Hugging Face. To customize any of these models, you finetune the base model and then convert and quantize the model using the appropriate AI Edge libraries. We have a Colab showing every step you need to fine-tune and then convert Gemma 3 1B. With the latest release of our quantization tools, we have new quantization schemes that allow for much higher quality int4 post training quantization. Compared to bf16, the default data type for many models, int4 quantization can reduce the size of language models by a factor of 2.5-4X while significantly decreasing latency and peak memory consumption.

Gemma 3 1B & Gemma 3n Earlier this year, we introduced Gemma 3 1B. At only 529MB, this model can run up to 2,585 tokens per second pre-fill on the mobile GPU, allowing it to process up to a page of content in under a second. Gemma 3 1B’s small footprint allows it to support a wide range of devices and limits the size of files an end user would need to download in their application. Today, we are thrilled to add an early preview of Gemma 3n to our collection of supported models. The 2B and 4B parameter variants will both support native text, image, video, and audio inputs. The text and image modalities are available on Hugging Face with audio to follow shortly.

