隆重推出 PaliGemma 2 mix:用于多种任务的视觉语言模型

二月 19, 2025
Omar Sanseviero Staff Developer Relations Engineer
Andreas Steiner Staff Software Engineer

去年 12 月,我们发布了 PaliGemma 2,这是 Gemma 系列中一款升级的视觉语言模型。此次发布推出了针对不同模型规模(30 亿、100 亿和 280 亿参数)的预训练检查点,这些检查点可以轻松地在广泛的视觉语言任务和领域(如图像分割、短视频字幕生成、科学问题回答以及与文本相关的任务)中进行微调,并且表现出色。

现在,我们非常激动地宣布推出 PaliGemma 2 mix 检查点。PaliGemma 2 mix 模型针对多种任务进行了调整。用户可以直接探索模型的能力,并且可以立即在常见的案例中应用该模型。


PaliGemma 2 mix 有什么新功能?

  • 一个模型解决多种任务:PaliGemma 2 mix 可以解决诸如短文本和长文本字幕生成、光学字符识别 (OCR)、图像问答、目标检测与分割等任务。

  • 便于开发者选择的多种规模:根据不同的需求选择最适合的模型规模(30 亿、100 亿和 280 亿参数)和分辨率(224 像素和 448 像素),使用最适合您的模型。

如果您已经在使用最初的 PaliGemma mix 检查点,则可以直接升级到 PaliGemma 2,而无需进行任何更改。该模型根据不同的提示执行不同的任务。请参阅官方文档了解不同的提示任务语法,并在我们的技术报告中进一步了解 PaliGemma 2 的开发过程。


检测

  • 任务:检测 (PaliGemma-2-3b-mix-224)
  • 输入:“detect android\n”
Input - "detect android\n"

结果:

Result in PaliGemma 2 Mix: A large, green Android figure stands on a white platform, enclosed by a red box. The word "android" is written in red above the figure.

多目标检测

  • 任务:多目标检测 (PaliGemma-2-3b-mix-224)
  • 输入:“detect chair ; table\n”
Multiple object detection of items in a dining room

结果:

A wooden table and chair are in the foreground. Additional tables and chairs can be seen in the background within a room with a bee patterned wall and wooden floors. Labeled boxes highlight the furniture with the text "table" and "chair."
  • 任务:多目标检测 (PaliGemma-2-3b-mix-224)
  • 输入:“detect food ; plate ; bowl\n”
Plates and bowls of food on a wooden table

结果:

Plates and bowls of food on a wooden table labeled with boxes that accurately identify "plate", "bowl" and "food"

光学字符识别 (OCR)

  • 任务:多目标检测 (PaliGemma-2-3b-mix-224)
  • 输入:“ocr\n”
Lighting labels in Japanese kanji

结果:

Japanese Kanji reads: Downlight, Dining Room, Kitchen, Living Room, Bathroom/Dressing Room]

分割

  • 任务:分割 (PaliGemma-2-3b-mix-224) [ImageFX 生成的图像]
  • 输入:“segment cat\n”
Image of a cat looking at the camera behind a wooden sign that reads 'Hello PaliGemma 2' generated by ImageFX

结果:

highlighted image of a cat looking at the camera behind a wooden sign that reads 'Hello PaliGemma 2' generated by ImageFX

问答

  • 任务:问答 (PaliGemma2-mix-3b-448) [ImageFX 生成的图像]
  • 输入:“answer en where is the cow standing?\n”
A cow standing on the beach next to a yellow sign that reads 'Warning Dangerous Rip Current' with an illustration of a large wave breaking.

结果:海滩


字幕生成

  • 输入:“caption en\n”
A cow standing on the beach next to a yellow sign that reads 'Warning Dangerous Rip Current' with an illustration of a large wave breaking.

结果:一头牛站在海滩上,旁边有个写着“警告:小心离岸流”的牌子。


光学字符识别 (OCR)

A cow standing on the beach next to a yellow sign that reads 'Warning Dangerous Rip Current' with an illustration of a large wave breaking.

结果:

警告

小心

离岸流


检测

  • 输入:“detect cow\n”
A cow standing on the beach next to a yellow sign that reads 'Warning Dangerous Rip Current' with an illustration of a large wave breaking.

结果:

A cow standing on the beach next to a yellow sign that reads 'Warning Dangerous Rip Current' with an illustration of a large wave breaking. A red box outlines the cow, with a label that reads "cow"

分割

  • 输入:“segment cow\n”
A cow standing on the beach next to a yellow sign that reads 'Warning Dangerous Rip Current' with an illustration of a large wave breaking.

结果:

A highlighted cow standing on the beach next to a yellow sign that reads 'Warning Dangerous Rip Current' with an illustration of a large wave breaking.

字幕生成

  • 任务:字幕生成 (PaliGemma 2-mix-10b-448)
  • 输入:“caption en\n”
A cow standing on the beach next to a yellow sign that reads 'Warning Dangerous Rip Current' with an illustration of a large wave breaking.

结果:一头牛站在海滩上,旁边有个警告标志。

光学字符识别 (OCR)

  • 任务:“ocr\n”
A cow standing on the beach next to a yellow sign that reads 'Warning Dangerous Rip Current' with an illustration of a large wave breaking.

结果:

警告:小心

离岸流


立即开始使用

准备好探索 PaliGemma 2 的潜力了吗?以下是探索 mix 模型功能的方法:

  • 轻点几下轻松试用 mix 模型:直接在 Hugging Face 演示上探索 mix 模型功能。

  • 学习如何运行模型:直接在 Google Colab 或本地环境中尝试 Keras 推理笔记本


PaliGemma 2 mix 在多个任务中表现出色,但通过在您自己的任务或领域中对 PaliGemma 2 进行调整,您甚至可以获得更佳结果。要学习如何操作,请深入了解我们的全面指导文档、查看 Keras 和 JAX 的官方示例笔记本,或使用 Hugging Face transformers 示例。我们期待看到您构建出的精彩作品!