Introducing PaliGemma 2 mix: A vision-language model for multiple tasks

FEB. 19, 2025

Omar Sanseviero Staff Developer Relations Engineer

Andreas Steiner Staff Software Engineer

This past December, we launched PaliGemma 2, an upgraded vision-language model in the Gemma family. The release included pretrained checkpoints of different sizes (3B, 10B, and 28B parameters) that can be easily fine-tuned on a wide range of vision-language tasks and domains, such as image segmentation, short video captioning, scientific question answering and text-related tasks with high performance.

Now, we’re thrilled to announce the launch of PaliGemma 2 mix checkpoints. PaliGemma 2 mix are models tuned to a mixture of tasks that allow directly exploring the model capabilities and using it out-of-the-box for common use cases.

What’s new in PaliGemma 2 mix?

Multiple tasks with one model: PaliGemma 2 mix can solve tasks such as short and long captioning, optical character recognition (OCR), image question answering, object detection and segmentation.

Developer-friendly sizes: Use the best model for your needs thanks to the different model sizes (3B, 10B, and 28B parameters) and resolutions (224px and 448px).

Use with your preferred framework: Leverage your preferred tools and frameworks, including Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cpp.

If you were already using the original PaliGemma mix checkpoints, you can directly upgrade to PaliGemma 2 without needing to do any changes. The model performs different tasks depending on how it’s prompted. You can review the different prompt task syntax in the official documentation and learn more about how PaliGemma 2 was developed in our technical report.

Detection

Task: Detection (PaliGemma-2-3b-mix-224)
Input: "detect android\n"

$Input - "detect android\n"$

Result:

Multiple Object Detection

Task: Multiple Object Detection (PaliGemma-2-3b-mix-224)
Input: “detect chair ; table\n”

Result:

A wooden table and chair are in the foreground. Additional tables and chairs can be seen in the background within a room with a bee patterned wall and wooden floors. Labeled boxes highlight the furniture with the text "table" and "chair."

Task: Multiple Object Detection (PaliGemma-2-3b-mix-224)
Input - "detect food ; plate ; bowl\n"

Plates and bowls of food on a wooden table

Result:

Plates and bowls of food on a wooden table labeled with boxes that accurately identify "plate", "bowl" and "food"

Optical Character Recognition (OCR)

Task: Multiple Object Detection (PaliGemma-2-3b-mix-224)
Input - "ocr\n"

Result:

Japanese Kanji reads: Downlight, Dining Room, Kitchen, Living Room, Bathroom/Dressing Room]

Segmentation

Task: Segmentation (PaliGemma-2-3b-mix-224) [Image generated by ImageFX]
Input - "segment cat\n"

Image of a cat looking at the camera behind a wooden sign that reads 'Hello PaliGemma 2' generated by ImageFX

Result:

highlighted image of a cat looking at the camera behind a wooden sign that reads 'Hello PaliGemma 2' generated by ImageFX

Question Answering

Task: Question Answering (PaliGemma2-mix-3b-448) [Image generated by ImageFX]
Input: “answer en where is the cow standing?\n"

A cow standing on the beach next to a yellow sign that reads 'Warning Dangerous Rip Current' with an illustration of a large wave breaking.

Result: beach

Captioning

Input: “caption en\n”

Result: a cow standing on a beach next to a sign that says warning dangerous rip current.

Optical Character Recognition (OCR)

Result:

WARNING

DANGEROUS

RIP CURRENT

Detection

Input: “detect cow\n”

Result:

A cow standing on the beach next to a yellow sign that reads 'Warning Dangerous Rip Current' with an illustration of a large wave breaking. A red box outlines the cow, with a label that reads "cow"

Segmentation

Input: “segment cow\n”

Result:

A highlighted cow standing on the beach next to a yellow sign that reads 'Warning Dangerous Rip Current' with an illustration of a large wave breaking.

Captioning

Task: Captioning (PaliGemma 2-mix-10b-448)
Input: “caption en\n”

Result: A cow standing on a beach next to a warning sign.

Optical Character Recognition (OCR)

Task: "ocr\n"

Result:

WARNING DANGEROUS

RIP CURRENT

Get Started Today

Ready to discover the potential of PaliGemma 2? Here is how you can explore the mix model capabilities:

Try out the mix model with a few clicks: Explore the mix model capabilities directly on the Hugging Face demo.

Download models: Access the mix models weights on Kaggle and Hugging Face.

Learn how to run the model: Try out the Keras inference notebook directly in Google Colab or locally.

Deploy and tune with a few clicks: Use PaliGemma 2 mix directly in Vertex Model Garden.

While PaliGemma 2 mix has strong performance across multiple tasks, you will get the best results by fine-tuning PaliGemma 2 in your own task or domain. To learn how to do it, dive into our comprehensive documentation, check our official example notebooks for Keras and JAX, or use the Hugging Face transformers example. We’re looking forward to seeing what you build with it!

AI Announcements

Say hello to a new level of interactivity in Gemini CLI

OCT. 15, 2025

Gemma Mobile AI Announcements

Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings

SEPT. 4, 2025

Gemma AI Announcements

Introducing Gemma 3 270M: The compact model for hyper-efficient AI

AUG. 14, 2025

AI How-To Guides Announcements

Introducing Coral NPU: A full-stack platform for Edge AI

OCT. 15, 2025