Building custom, advanced AI that can "see" used to be a complex and resource-intensive endeavor. Not anymore. This past May, we launched PaliGemma, the first vision-language model in the Gemma family, taking a significant step toward making class-leading visual AI more accessible. Now, we're thrilled to introduce PaliGemma 2, the next evolution in tunable vision-language models.
PaliGemma 2 builds upon the performant Gemma 2 models, adding the power of vision and making it easier than ever to fine-tune for exceptional performance. With PaliGemma 2, these models can see, understand, and interact with visual input, opening up a world of new possibilities.
Upgrading to PaliGemma 2 is a breeze for existing PaliGemma users. It's designed as a drop-in replacement, offering a range of model sizes with immediate performance gains on most tasks without major code modifications. Additionally, its flexibility makes fine-tuning for specific tasks and datasets straightforward, empowering you to tailor its capabilities to your precise needs.
You can learn more about how PaliGemma 2 works, including when to use more parameters and larger resolutions, in our technical report.
Since its launch, the Gemma family has rapidly grown into a vibrant ecosystem—the Gemmaverse—with tens of thousands of models and applications. This rapid growth is a testament to the community's ingenuity. Early innovations using PaliGemma, such as ColPali's advancements in visual document retrieval, RoboFlow's fine-tuning techniques, and progress in real-time object tracking, demonstrate the expanding potential of the Gemmaverse.
Ready to explore the potential of PaliGemma 2? Here's how:
We're incredibly excited to see what you create with PaliGemma 2. Join the vibrant Gemma community, share your projects to the Gemmaverse, and let's continue to explore the boundless potential of AI together. Your feedback and contributions are invaluable in shaping the future of these models and driving innovation in the field.