Imagen 3 が Gemini API に対応

2025年2月6日

Ivan Solovyev Product Manager

デベロッパーの皆さんが、Google の最先端画像生成モデル Imagen 3 に Gemini API を通してアクセスできるようになります。まず有料ユーザーがこのモデルにアクセスできるようになり、無料ユーザーには近日中にロールアウトされます。

Imagen 3 は、極めてリアルな画像から、印象的な風景、抽象的な構図、アニメキャラクターまで、視覚的に魅力的で乱れのない画像をさまざまなスタイルで制作できます。プロンプトへの従い方が改善されているため、優れたアイデアから簡単に高品質の画像を生み出せます。Imagen 3 は、総合的に見て、さまざまなベンチマークで最高クラスのパフォーマンスを達成しています。このような高度な機能にもかかわらず、Gemini API では画像 1 つあたり 0.03 ドルという価格になっており、アスペクト比や生成するオプションの数なども制御できます。

情報や作成元が誤認されることがないように、Imagen 3 が生成した画像にはすべて目に見えないデジタル SynthID 透かしが含まれており、AI が生成したものと識別できます。

Imagen 3 の実際

以下のギャラリーでは、さまざまなスタイルによる Imagen 3 の生成機能を紹介します。

Imagen 3 generated image of a group of people looking happy, natural light, 8k

プロンプト: 幸せそうな人々、自然光、8k（実際のプロンプトは英語です）

Imagen 3 generated Hyperrealistic portrait of a person dressed in 1920s flapper fashion, vintage style, black and white photograph, elegant pose, 8k

プロンプト: 1920 年代のフラッパーファッションのリアルな人物写真、ヴィンテージスタイル、白黒写真、エレガントなポーズ、8k（実際のプロンプトは英語です）

Imagen 3 generated image of a close-up of a vintage watch with realistic and detailed mechanism

プロンプト: ヴィンテージ時計のクローズアップ写真を想像し、細かい機構までリアルに描写してください（実際のプロンプトは英語です）

Imagen 3 generated image of an impressionistic landscape painting of a sunset over a field of sunflowers, vibrant colors, thick brushstrokes, inspired by Monet

プロンプト: ひまわり畑に沈む夕日の印象派風の風景画、鮮やかな色、太目の筆、モネ風（実際のプロンプトは英語です）

Imagen 3 generated image of A surreal dreamscape featuring a giant tortoise with a lush forest growing on its back, floating through a starry sky, glowing mushrooms, bioluminescent plants, ethereal atmosphere

プロンプト: 甲羅から樹木が密生している巨大なカメが星空に浮かぶシュールな夢の景色。輝くキノコ、光る植物、この世のものとは思えない雰囲気（実際のプロンプトは英語です）

Imagen 3 generated lifestyle image of freshly roasted coffee beans spilling out of a burlap sack onto a rustic wooden table next to a up of coffee with 'Awaken Your Senses' written on the cup in cursive

プロンプト: ライフスタイル画像。焙煎したてのコーヒー豆が麻の袋から素朴な木製のテーブルにこぼれ落ち、近くのコーヒーカップから湯気が上がっている。カップには筆記体で「Awaken Your Senses」（感覚を呼び覚ます）と書かれている。暖かく魅力的な雰囲気、朝日、製品写真（実際のプロンプトは英語です）

Imagen 3 generated hyperrealistic portrait of a woman with piercing blue eyes, laughing, freckles, dramatic lighting, detailed skin texture, 8k

プロンプト: 青く澄んだ目をした女性のリアルな人物写真、笑顔、そばかす、ドラマチックな照明、詳細な肌の質感、8k（実際のプロンプトは英語です）

Imagen 3 generated panoramic view of a majestic mountain range at dawn

プロンプト: 夜明けの雄大な山脈の全景。（実際のプロンプトは英語です）

Imagen 3 generated scene from a game where the player needs to find a specific object by looking into drawers in a messy desk

プロンプト: プレーヤーが散らかった机の引き出しを調べて、特別なアイテムを探さなければならないゲームのシーンを描いてください。（実際のプロンプトは英語です）

Imagen 3 generated painted cityscape in the style of Van Gogh

プロンプト: ゴッホ風のスタイルの都市景観、渦巻くような筆づかいと鮮やかな色。（実際のプロンプトは英語です）

Gemini API で Imagen 3 を使ってみる

次に示すのは、Gemini API を使って Imagen 3 で画像を生成する Python コードスニペットです。

from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
 
client = genai.Client(api_key='GEMINI_API_KEY')
 
response = client.models.generate_images(
    model='imagen-3.0-generate-002',
    prompt='a portrait of a sheepadoodle wearing cape',
    config=types.GenerateImagesConfig(
        number_of_images=1,
    )
)
for generated_image in response.generated_images:
  image = Image.open(BytesIO(generated_image.image.image_bytes))
  image.show()

Python

生成された画像

Imagen 3 generated portrait of a sheepadoodle wearing a cape

Gemini API デベロッパードキュメントで、プロンプトに関するアドバイスや画像のスタイルを確認できます。スコア、手法、パフォーマンス向上の詳細については、改訂されたテクニカルレポートの付録 D をご覧ください。

メディア生成モデルの利用を拡大し、Gemini API に展開する第一歩を踏み出せたことをうれしく思っています。近いうちに公開範囲を広げ、デベロッパーの皆さんがメディア生成と言語モデルをつなげるようにする予定です。