隆重推出 PaliGemma 2 mix：用于多种任务的视觉语言模型

2025年2月19日

Omar Sanseviero Staff Developer Relations Engineer

Andreas Steiner Staff Software Engineer

去年 12 月，我们发布了 PaliGemma 2，这是 Gemma 系列中一款升级的视觉语言模型。此次发布推出了针对不同模型规模（30 亿、100 亿和 280 亿参数）的预训练检查点，这些检查点可以轻松地在广泛的视觉语言任务和领域（如图像分割、短视频字幕生成、科学问题回答以及与文本相关的任务）中进行微调，并且表现出色。

现在，我们非常激动地宣布推出 PaliGemma 2 mix 检查点。PaliGemma 2 mix 模型针对多种任务进行了调整。用户可以直接探索模型的能力，并且可以立即在常见的案例中应用该模型。

PaliGemma 2 mix 有什么新功能？

一个模型解决多种任务：PaliGemma 2 mix 可以解决诸如短文本和长文本字幕生成、光学字符识别 (OCR)、图像问答、目标检测与分割等任务。

便于开发者选择的多种规模：根据不同的需求选择最适合的模型规模（30 亿、100 亿和 280 亿参数）和分辨率（224 像素和 448 像素），使用最适合您的模型。

支持您偏好的框架：使用您喜爱的工具和框架，包括 Hugging Face Transformers、Keras、PyTorch、JAX 和 Gemma.cpp。

如果您已经在使用最初的 PaliGemma mix 检查点，则可以直接升级到 PaliGemma 2，而无需进行任何更改。该模型根据不同的提示执行不同的任务。请参阅官方文档了解不同的提示任务语法，并在我们的技术报告中进一步了解 PaliGemma 2 的开发过程。

检测

任务：检测 (PaliGemma-2-3b-mix-224)
输入：“detect android\n”

$Input - "detect android\n"$

结果：

Result in PaliGemma 2 Mix: A large, green Android figure stands on a white platform, enclosed by a red box. The word "android" is written in red above the figure.

多目标检测

任务：多目标检测 (PaliGemma-2-3b-mix-224)
输入：“detect chair ; table\n”

Multiple object detection of items in a dining room

结果：

A wooden table and chair are in the foreground. Additional tables and chairs can be seen in the background within a room with a bee patterned wall and wooden floors. Labeled boxes highlight the furniture with the text "table" and "chair."

任务：多目标检测 (PaliGemma-2-3b-mix-224)
输入：“detect food ; plate ; bowl\n”

Plates and bowls of food on a wooden table

结果：

Plates and bowls of food on a wooden table labeled with boxes that accurately identify "plate", "bowl" and "food"

光学字符识别 (OCR)

任务：多目标检测 (PaliGemma-2-3b-mix-224)
输入：“ocr\n”

结果：

Japanese Kanji reads: Downlight, Dining Room, Kitchen, Living Room, Bathroom/Dressing Room]

分割

任务：分割 (PaliGemma-2-3b-mix-224) [ImageFX 生成的图像]
输入：“segment cat\n”

Image of a cat looking at the camera behind a wooden sign that reads 'Hello PaliGemma 2' generated by ImageFX

结果：

highlighted image of a cat looking at the camera behind a wooden sign that reads 'Hello PaliGemma 2' generated by ImageFX

问答

任务：问答 (PaliGemma2-mix-3b-448) [ImageFX 生成的图像]
输入：“answer en where is the cow standing?\n”

A cow standing on the beach next to a yellow sign that reads 'Warning Dangerous Rip Current' with an illustration of a large wave breaking.

结果：海滩

字幕生成

输入：“caption en\n”

结果：一头牛站在海滩上，旁边有个写着“警告：小心离岸流”的牌子。

光学字符识别 (OCR)

结果：

警告

小心

离岸流

检测

输入：“detect cow\n”

结果：

A cow standing on the beach next to a yellow sign that reads 'Warning Dangerous Rip Current' with an illustration of a large wave breaking. A red box outlines the cow, with a label that reads "cow"

分割

输入：“segment cow\n”

结果：

A highlighted cow standing on the beach next to a yellow sign that reads 'Warning Dangerous Rip Current' with an illustration of a large wave breaking.

字幕生成

任务：字幕生成 (PaliGemma 2-mix-10b-448)
输入：“caption en\n”

结果：一头牛站在海滩上，旁边有个警告标志。

光学字符识别 (OCR)

任务：“ocr\n”

结果：

警告：小心

离岸流

立即开始使用

准备好探索 PaliGemma 2 的潜力了吗？以下是探索 mix 模型功能的方法：

轻点几下轻松试用 mix 模型：直接在 Hugging Face 演示上探索 mix 模型功能。

下载模型：在 Kaggle 和 Hugging Face 上获取 mix 模型的权重。

学习如何运行模型：直接在 Google Colab 或本地环境中尝试 Keras 推理笔记本。

轻点几下轻松部署并调整：直接在 Vertex Model Garden 中使用 PaliGemma 2 mix。

PaliGemma 2 mix 在多个任务中表现出色，但通过在您自己的任务或领域中对 PaliGemma 2 进行调整，您甚至可以获得更佳结果。要学习如何操作，请深入了解我们的全面指导文档、查看 Keras 和 JAX 的官方示例笔记本，或使用 Hugging Face transformers 示例。我们期待看到您构建出的精彩作品！