チャットボットの先へ: Gemma によるエージェント型 AI

2025年2月13日

Ju-yeong Ji Sr. Technical Consultant Gen AI – AI Studio

Gemma は軽量の生成型人工知能（AI）オープンモデルファミリーで、Gemini モデルと同じ研究技術で構築されています。昨年のブログ記事では、Gemma を使ってテキストベースのアドベンチャーゲームを作る方法を紹介しました。このブログ投稿では、Gemma を使ってエージェント型 AI と呼ばれる AI を作る方法を学びます。エージェント型 AI は、大規模言語モデル（LLM）を活用するもう 1 つの方法です。

現在一番よく使われている AI は受動型です。つまり、頼まれたときに音楽を再生するスマートスピーカーのように、特定のコマンドに応答します。これは便利ですが、言われたことしかできません。

これとは異なり、エージェント型 AI は積極的で自律的です。目標を達成するために、独自に意思決定を行います。主な機能の 1 つとして、検索エンジンや専用ソフトウェア、その他のプログラムなどの外部ツールを利用して、固有のナレッジベースを超えた情報を取得する機能が挙げられます。そのため、エージェント型 AI は自律的かつ効果的に問題を解決できます。

ここでは、Gemma 2 ベースのエージェント型 AI システムを開発するための実践的ガイドを紹介し、「関数呼び出し」、「ReAct」、「Few-shot プロンプト」といった主要な技術的概念について説明します。この AI システムは、架空のゲームで動的に伝承を生成します。プレーヤーのために歴史を積極的に作り出し、物語を独特の方法で絶え間なく進化させます。

ギャップを埋める

コーディングを始める前に、Gemma のエージェント AI 機能について理解しておきましょう。Gemma は、Google AI Studio から直接試すことができます。Google AI Studio では、いくつかの種類の Gemma 2 モデルが提供されています。最高のパフォーマンスを得るには、27B モデルをおすすめしますが、以下で示すように、2B などの小型モデルも利用できます。この例では、Gemma に get_current_time() 関数があることを伝えたうえで、東京とパリの時間を教えてもらいます。

This result shows that Gemma 2 does not suggest calling the get_current_time() function. This model capability is called "Function Calling", which is a key feature for enabling AI to interact with external systems and APIs to retrieve data.

Gemma に内蔵されている関数呼び出し機能には制限があります。そのため、エージェントとして動作する機能にも制限があります。ただし、命令に従うという強力な機能を活用すれば、この足りない機能を補うことができます。この機能を使って Gemma の機能を拡張する方法を見てみましょう。

ここでは、ReAct（推論と行動）というプロンプトスタイルに基づいたプロンプトを使います。ReAct では、利用できるツールとインタラクションに使う具体的な形式を定義します。この構造により、Gemma は思考（推論）、行動（ツールの利用）、観察（出力の分析）のサイクルを実行できます。

AI Assistant : Getting Time in Google AI Studio

As you can see, Gemma is attempting to use the get_current_time() function for both Tokyo and Paris. A Gemma model cannot simply execute on its own. To make this operational, you’ll need to run the generated code yourself or as part of your system. Without it, you can still proceed and observe Gemma’s response, similar to the one provided below.

Gemma attempting to use `get_current_time` function for both Tokyo and Paris in Google AI Studio

Awesome! Now you’ve witnessed Gemma’s function calling in action. This function calling ability allows it to execute operations autonomously in the background, executing tasks without requiring direct user interaction.

Let’s get our hands dirty with the actual demo, building a History AI Agent!

Demo Setup

All the prompts below are in the "Agentic AI with Gemma 2" notebook in Gemma's Cookbook. One difference when using Gemma in Google AI Studio versus directly with Python on Colab is that you must use a specific format like <start_of_turn> to give instructions to Gemma. You can learn more about this from the official docs.

Let’s imagine a fictional game world where AI agents craft dynamic content.

These agents, designed with specific objectives, can generate in-game content like books, poems, and songs, in response to a player choice or significant events within the game’s narrative.

A key feature of these AI agents is their ability to break down complex goals into smaller actionable steps. They can analyze different approaches, evaluate potential outcomes, and adapt their plans based on new information.

Where Agentic AI truly shines is that they’re not just passively spitting out information. They can interact with digital (and potentially physical) environments, execute tasks, and make decisions autonomously to achieve their programmed objectives.

So, how does it work?

Here’s an example ReAct style prompt designed for an AI agent that generates in-game content, with the capability to use function calls to retrieve historical information.

<start_of_turn>user
You are an AI Historian in a game. Your goal is to create books, poems, and songs found in the game world so that the player's choices meaningfully impact the unfolding of events.
 
You have access to the following tools:
 
* `get_historical_events(year, location=None, keyword=None)`: Retrieves a list of historical events within a specific year.
* `get_person_info(name)`: Retrieves information about a historical figure.
* `get_location_info(location_name)`: Retrieves information about a location.
 
Use the following multi-step conversation:
 
Thought: I need to do something...
Action: I should use the tool `tool_name` with input `tool_input`
 
Wait user to get the result of the tool is `tool_output`
 
And finally answer the Content of books, poems, or songs.

Markdown

Let’s try to write a book. See the example outputs below:

Zero-shot prompting

As you can see, Gemma may struggle with function calling due to a lack of training in that area.

To address this limitation, we can employ "One-shot prompting", a form of in-context learning, where demonstrations are embedded within the prompt. This example will serve as a guide for Gemma, allowing it to understand the intended task and improve its performance through contextual learning.

One-Shot Prompting

(Note: the green section is a provided example, the actual prompt comes after it)

Notably, the model performs better since Action contains the correct input.

Few-shot prompting

For more complex tasks, use "Few-shot prompting". It works by providing a small set of examples (usually 2-5, but sometimes more) that demonstrate the desired input-output relationship, allowing the model to grasp the underlying pattern.

Now, we received a function name get_person_info and parameter values "name: Anya, the Rebel Leader", the game must connect to an API and call the function. We will use a synthetic response payload for this API interaction.

Agentic-AI-with-Gemma-few-shot-prompting-example

Note that the agent used the provided information to create a book about Eldoria's Rebel Leader.

The Future is Agentic

We’re still in the early stages of Agentic AI development, but the progress is rapid. As these systems become more sophisticated, we can expect them to play an increasingly significant role in our lives.

Here are some potential applications, focused primarily on gaming:

Lifelike NPCs: NPCs will become more believable, exhibiting unique personalities and adapting to player interactions.
Dynamic Stories: Games will offer dynamically generated stories and quests, ensuring lasting replayability.
Efficient Development: AI can streamline game testing, leading to higher quality and faster development cycles.

But with implications beyond:

GUI Automation: Models can be used to interact with graphical user interfaces directly within a web browser.
Mathematical Tool Integration: AI can utilize tools like calculators to overcome limitations in performing complex calculations.
Contextual Knowledge Retrieval: AI can decide when it needs to query external knowledge sources (as in RAG systems).

Next steps

The era of passive, reactive AI is gradually giving way to a future where AI is proactive, goal-oriented, and capable of independent action. This is the dawn of Agentic AI, and it's a future worth getting excited about.

The Gemma Cookbook repository is a place where various ideas like this come together. Contributions are always welcome. If you have a notebook that implements a new idea, please send us a Pull Request.

Thanks for reading and catch you in the next one.