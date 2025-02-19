This past December, we launched PaliGemma 2, an upgraded vision-language model in the Gemma family. The release included pretrained checkpoints of different sizes (3B, 10B, and 28B parameters) that can be easily fine-tuned on a wide range of vision-language tasks and domains, such as image segmentation, short video captioning, scientific question answering and text-related tasks with high performance.

Now, we’re thrilled to announce the launch of PaliGemma 2 mix checkpoints. PaliGemma 2 mix are models tuned to a mixture of tasks that allow directly exploring the model capabilities and using it out-of-the-box for common use cases.



What’s new in PaliGemma 2 mix?

Multiple tasks with one model: PaliGemma 2 mix can solve tasks such as short and long captioning, optical character recognition (OCR), image question answering, object detection and segmentation.

Developer-friendly sizes: Use the best model for your needs thanks to the different model sizes (3B, 10B, and 28B parameters) and resolutions (224px and 448px).

Use with your preferred framework: Leverage your preferred tools and frameworks, including Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cpp.

If you were already using the original PaliGemma mix checkpoints, you can directly upgrade to PaliGemma 2 without needing to do any changes. The model performs different tasks depending on how it’s prompted. You can review the different prompt task syntax in the official documentation and learn more about how PaliGemma 2 was developed in our technical report.



Detection