Bring state-of-the-art agentic skills to the edge with Gemma 4

APRIL 2, 2026

Today, Google DeepMind launched Gemma 4, a family of state-of-the-art open models that redefine what is possible on your own hardware. Now available under the Apache 2.0 license, Gemma 4 gives developers a powerful toolkit for on-device AI development. With Gemma 4, you can now go beyond chatbots to build agents and autonomous AI use cases running directly on-device. Gemma 4 enables multi-step planning, autonomous action, offline code generation, and even audio-visual processing, all without specialized fine-tuning. It’s also built for a global audience with support for over 140 languages.

Gemma 4 enables visual processing and support in >140 languages

We are excited to announce that you can experience Gemma 4’s expansive capabilities on the edge starting today! Access Android's built-in Gemma 4 model through the new AICore Developer Preview, or leverage Google AI Edge to build agentic, in-app experiences across mobile, desktop, and edge devices.

In this post, we’ll show you how to get started with Google AI Edge using both Google AI Edge Gallery and LiteRT-LM.

Google AI Edge Gallery, available on iOS and Android, allows you to build and experiment with AI experiences that run entirely on-device. Today, we are thrilled to announce the launch of Agent Skills, one of the first applications to run multi-step, autonomous agentic workflows entirely on-device. Powered by Gemma 4, Agent Skills can:

  • Augment the knowledge base: Gemma 4 can access the information beyond its initial training data using skills to enable agentic enrichment type experiences. For example, you can build a skill to query Wikipedia, allowing the agent to query and respond to any encyclopedic question.
Query Wikipedia or other knowledge sources
  • Produce rich, interactive content: Transform paragraphs or videos into concise summaries or flashcards for studying, or transform data into interactive visualizations or graphs. For example, you can create a skill that automatically summarizes and displays trends in hours of sleep and moods per day based on user speech input:
Create graphs, flashcards, and other visualizations
  • Expand Gemma 4's core capabilities: Integrate with other models, such as text-to-speech, image generation, or music synthesis. For instance, you can utilize skills to pair photos with music that perfectly matches the mood.
Integrate with other models to synthesize music and understand images
  • Create comprehensive end-to-end experiences: Rather than navigating multiple apps, users can manage complex workflows and build their own applications entirely through conversation with Gemma 4. To illustrate this, we built a working app that describes and plays the vocal calls of animals.
Build multi-step workflows and end-to-end experiences

To experience the Gemma 4 E2B and E4B models in action, check out the Google AI Edge Gallery app today. Within the app, it’s easy to start experimenting and creating your own skills with our guide. We can’t wait to see what you build and share your skills in the Github Discussion!

Leverage Gemma 4 across devices with LiteRT-LM

For developers who are interested in deploying Gemma 4 in-app or across a broader range of devices, LiteRT-LM provides stellar performance with reach across the entire hardware spectrum. LiteRT-LM adds GenAI specific libraries on top of LiteRT, which is already trusted by millions of Android and edge developers with its high-performance libraries XNNPack and ML Drift. LiteRT-LM builds on this stack and enhances model performance with the following new features:

  • Minimal Memory footprint: Run Gemma 4 E2B using <1.5GB memory on some devices thanks to LiteRT’s support for 2-bit and 4-bit weights along with memory-mapped per-layer embeddings
  • Constrained decoding: Get structured, predictable outputs every time, ensuring your AI-driven apps and tool-calling scripts remain reliable in production.
  • Dynamic context: Flexibility to handle single models across CPUs and GPUs with dynamic context lengths, allowing you to take full advantage of the Gemma 4 128K context window.

To support the extended context lengths required by agentic use cases, LiteRT-LM leverages cutting-edge GPU optimizations to process 4,000 input tokens across 2 distinct skills in under 3 seconds.

LiteRT-LM also brings smaller Gemma 4 models to IoT & edge devices with compelling performance. On a Raspberry Pi 5, for example, it achieves a prefill throughput of 133 tokens per second and decode throughput of 7.6 tokens per second on Gemma 4 E2B. With this performance, you can run smart home controllers, voice assistants, and robotics completely offline on constrained hardware.

Ready to get started? Check out the LiteRT-LM documentation for a complete guide and device-specific performance metrics. You can also view the individual model cards for Gemma 4 E2B and Gemma 4 E4B.

Run on any device

Gemma 4 is available today with support across an unprecedented range of platforms:

  • Mobile: Available with CPU/GPU support across both Android and iOS. Developers can also access and deploy Android's built-in and optimized Gemma 4 model system-wide via Android AICore.
  • Desktop and Web: Seamless performance on Windows, Linux, and macOS (via Metal), plus native browser-based execution powered by WebGPU.
  • IoT and robotics: We are bringing Gemma 4 to the edge on Raspberry Pi 5 and Qualcomm IQ8 NPU platforms.

Today, we are also launching a new Python package and CLI tool to make it easier than ever to experiment with Gemma in the console, and to power Gemma-based Python pipelines for IoT devices. The litert-lm CLI is available on Linux, macOS, and Raspberry Pi, enabling developers to try out the latest Gemma 4 model capabilities without writing any code. The CLI now also supports tool calling that powered Agent Skills in Google AI Edge Gallery. Python bindings for LiteRT-LM provide the flexibility to deeply customize your on-device LLM pipeline from Python. Getting started with LiteRT-LM in your terminal is simple using our guide.

The era of agentic experiences on-device is here, and we hope you are excited to start building on the edge. Regardless of which device you are building on, get started with our Agent Skills examples in Google AI Edge Gallery, and LiteRT-LM getting started guide. We can’t wait to see what you build!

Acknowledgements

We'd like to extend a special thanks to our significant contributors for their work on this project:

Advait Jain, Alice Zheng, Amber Heinbockel, Andrew Zhang, Byungchul Kim, Cormac Brick, Daniel Ho, Derek Bekebrede, Dillon Sharlet, Eric Yang, Fengwu Yao, Frank Barchard, Grant Jensen, Hriday Chhabria, Jae Yoo, Jenn Lee, Jing Jin, Jingxiao Zheng, Juhyun Lee, Lu Wang, Lin Chen, Majid Dadashi, Marissa Ikonomidis, Matthew Chan, Matthew Soulanille, Matthias Grundmann, Milen Ferev, Misha Gutman, Mohammadreza Heydary, Pradeep Kuppala, Qidong Zhao, Quentin Khan, Ram Iyengar, Raman Sarokin, Renjie Wu, Rishika Sinha, Rodney Witcher, Ronghui Zhu, Sachin Kotwani, Suleman Shahid, Tenghui Zhu, Terry Heo, Tiffany Hsiao, Wai Hon Law, Weiyi Wang, Xiaoming Hu, Xu Chen, Yishuang Pang, Yi-Chun Kuo, Yu-Hui Chen, Zichuan Wei, and the gTech team.