超越聊天机器人的极限:由 Gemma 提供支持的代理式 AI

二月 13, 2025
Ju-yeong Ji Gemma DevRel

Gemma 是一个轻量级生成式人工智能 (AI) 开放模型系列,采用与创建 Gemini 模型相同的研究和技术构建而成。在去年的一篇博文中,我们展示了如何使用 Gemma 创建基于文本的冒险游戏。在本篇博文中,我们将介绍如何将 Gemma 与一种名为代理式 AI 的 AI 形式结合使用。这是一种利用大型语言模型 (LLM) 的全新方式。

当今最常见的 AI 大多是反应式的。它们响应特定的命令,比如智能音箱收到指令播放音乐。这类 AI 虽然实用,但只能执行被明确告知的任务。

相比之下,代理式 AI 是主动且自主的。它能够自行做出决策以达成目标。其关键特性之一是使用外部工具,如搜索引擎、专业软件及其他程序来获取超出其固有知识库的信息。这使得代理式 AI 能够非常独立且高效地工作和解决问题。

在本文中,我们将介绍构建基于 Gemma 2 的代理式 AI 系统的实用指南,涵盖如“函数调用”、“ReAct”和“少量样本提示”等关键技术概念。这个 AI 系统作为虚构游戏的动态背景生成器,不仅能积极扩展游戏的历史背景,还能为玩家提供一个独特且持续演变的叙事世界观。


缩小差距

在我们深入学习代码知识之前,让我们先了解一下 Gemma 的代理式 AI 功能。您可以通过 Google AI Studio 直接进行实验。Google AI Studio 提供多个 Gemma 2 模型。为了获得最佳性能,推荐使用 27B 模型,但如下面所示,您也可以使用较小的 2B 模型。在此示例中,我们告诉 Gemma 存在一个 get_current_time () 函数,并要求 Gemma 告诉我们东京和巴黎的时间。

Time Request Denied in Google AI Studio

This result shows that Gemma 2 does not suggest calling the get_current_time() function. This model capability is called "Function Calling", which is a key feature for enabling AI to interact with external systems and APIs to retrieve data.

Gemma 内置的函数调用能力有限,这限制了其作为代理的能力。然而,它强大的指令跟随能力可以弥补这一缺失的功能。让我们来看看如何利用这些能力来扩展 Gemma 的功能。

我们将根据 ReAct(推理和行动)提示风格实施提示。ReAct 定义了可用的工具和特定的互动格式。这种结构使 Gemma 能够进行思考(推理)、行动(利用工具)和观察(分析输出)的循环。

AI Assistant : Getting Time in Google AI Studio

As you can see, Gemma is attempting to use the get_current_time() function for both Tokyo and Paris. A Gemma model cannot simply execute on its own. To make this operational, you’ll need to run the generated code yourself or as part of your system. Without it, you can still proceed and observe Gemma’s response, similar to the one provided below.

Gemma attempting to use `get_current_time` function for both Tokyo and Paris in Google AI Studio

Awesome! Now you’ve witnessed Gemma’s function calling in action. This function calling ability allows it to execute operations autonomously in the background, executing tasks without requiring direct user interaction.

Let’s get our hands dirty with the actual demo, building a History AI Agent!


Demo Setup

All the prompts below are in the "Agentic AI with Gemma 2" notebook in Gemma's Cookbook. One difference when using Gemma in Google AI Studio versus directly with Python on Colab is that you must use a specific format like <start_of_turn> to give instructions to Gemma. You can learn more about this from the official docs.

Let’s imagine a fictional game world where AI agents craft dynamic content.

These agents, designed with specific objectives, can generate in-game content like books, poems, and songs, in response to a player choice or significant events within the game’s narrative.

A key feature of these AI agents is their ability to break down complex goals into smaller actionable steps. They can analyze different approaches, evaluate potential outcomes, and adapt their plans based on new information.

Where Agentic AI truly shines is that they’re not just passively spitting out information. They can interact with digital (and potentially physical) environments, execute tasks, and make decisions autonomously to achieve their programmed objectives.


So, how does it work?

Here’s an example ReAct style prompt designed for an AI agent that generates in-game content, with the capability to use function calls to retrieve historical information.

<start_of_turn>user
You are an AI Historian in a game. Your goal is to create books, poems, and songs found in the game world so that the player's choices meaningfully impact the unfolding of events.
 
You have access to the following tools:
 
* `get_historical_events(year, location=None, keyword=None)`: Retrieves a list of historical events within a specific year.
* `get_person_info(name)`: Retrieves information about a historical figure.
* `get_location_info(location_name)`: Retrieves information about a location.
 
Use the following multi-step conversation:
 
Thought: I need to do something...
Action: I should use the tool `tool_name` with input `tool_input`
 
Wait user to get the result of the tool is `tool_output`
 
And finally answer the Content of books, poems, or songs.

Let’s try to write a book. See the example outputs below:


Zero-shot prompting

Agentic-AI-with-Gemma-zero-shot-prompting-example

As you can see, Gemma may struggle with function calling due to a lack of training in that area.

To address this limitation, we can employ "One-shot prompting", a form of in-context learning, where demonstrations are embedded within the prompt. This example will serve as a guide for Gemma, allowing it to understand the intended task and improve its performance through contextual learning.


One-Shot Prompting

(Note: the green section is a provided example, the actual prompt comes after it)

Agentic-AI-with-Gemma-One-Shot-prompting-example

Notably, the model performs better since Action contains the correct input.


Few-shot prompting

For more complex tasks, use "Few-shot prompting". It works by providing a small set of examples (usually 2-5, but sometimes more) that demonstrate the desired input-output relationship, allowing the model to grasp the underlying pattern.

Now, we received a function name get_person_info and parameter values "name: Anya, the Rebel Leader", the game must connect to an API and call the function. We will use a synthetic response payload for this API interaction.

Agentic-AI-with-Gemma-few-shot-prompting-example

Note that the agent used the provided information to create a book about Eldoria's Rebel Leader.


The Future is Agentic

We’re still in the early stages of Agentic AI development, but the progress is rapid. As these systems become more sophisticated, we can expect them to play an increasingly significant role in our lives.

Here are some potential applications, focused primarily on gaming:

  • Lifelike NPCs: NPCs will become more believable, exhibiting unique personalities and adapting to player interactions.
  • Dynamic Stories: Games will offer dynamically generated stories and quests, ensuring lasting replayability.
  • Efficient Development: AI can streamline game testing, leading to higher quality and faster development cycles.

But with implications beyond:

  • GUI Automation: Models can be used to interact with graphical user interfaces directly within a web browser.
  • Mathematical Tool Integration: AI can utilize tools like calculators to overcome limitations in performing complex calculations.
  • Contextual Knowledge Retrieval: AI can decide when it needs to query external knowledge sources (as in RAG systems).


Next steps

The era of passive, reactive AI is gradually giving way to a future where AI is proactive, goal-oriented, and capable of independent action. This is the dawn of Agentic AI, and it's a future worth getting excited about.

The Gemma Cookbook repository is a place where various ideas like this come together. Contributions are always welcome. If you have a notebook that implements a new idea, please send us a Pull Request.

Thanks for reading and catch you in the next one.