Além do bot de chat: IA agêntica com o Gemma

13 DE FEVEREIRO DE 2025

Ju-yeong Ji Sr. Technical Consultant Gen AI – AI Studio

O Gemma é uma família de modelos abertos e leves de inteligência artificial (IA) generativa criados a partir da mesma pesquisa e tecnologia usadas na criação dos modelos Gemini. Em uma postagem do blog do ano passado, apresentamos uma criação de jogo de aventura baseado em texto usando o Gemma. Nesta postagem do blog, você aprenderá a usar o Gemma com uma forma de IA chamada IA agêntica, que oferece uma maneira diferente de usar modelos de linguagem grandes (LLMs, na sigla em inglês).

As IAs mais comuns hoje são reativas. Elas respondem a comandos específicos, como um alto-falante inteligente que toca música quando solicitado. Elas são úteis, mas só podem fazer coisas para as quais recebem instruções.

A IA agêntica, por outro lado, é proativa e autônoma. Ela toma suas próprias decisões para atingir metas. Um recurso essencial é o uso de ferramentas externas, como mecanismos de pesquisa, software especializado e outros programas, para obter informações além de sua base de conhecimento inerente. Isso permite que a IA agêntica trabalhe e resolva problemas de forma muito independente e eficaz.

Aqui, forneceremos um guia prático para criar um sistema de IA agêntica com base no Gemma 2, abordando os principais conceitos técnicos, como "chamadas de função", "ReAct" e "few-shot prompting". Esse sistema de IA atuará como um gerador de conhecimento dinâmico para um jogo fictício, expandindo ativamente a história do jogo e fornecendo um cenário narrativo distinto e em constante evolução para os jogadores.

Eliminação de lacunas

Antes de nos aprofundarmos na codificação, vamos entender os recursos da IA agêntica do Gemma. Você pode experimentá-los diretamente no Google AI Studio. O Google AI Studio oferece vários modelos Gemma 2. O modelo 27B é recomendado para atingir o melhor desempenho, mas um modelo menor, como o 2B, também pode ser usado, como você pode ver abaixo. Neste exemplo, dizemos ao Gemma que há uma função get_current_time() e pedimos a ele que nos diga a hora em Tóquio e Paris.

This result shows that Gemma 2 does not suggest calling the get_current_time() function. This model capability is called "Function Calling", which is a key feature for enabling AI to interact with external systems and APIs to retrieve data.

Os recursos integrados de chamada de função do Gemma são limitados, o que limita sua capacidade de atuar como um agente. No entanto, seus potentes recursos de acompanhamento de instruções podem ser usados para compensar essa funcionalidade ausente. Vejamos como é possível aproveitar esses recursos para expandir a funcionalidade do Gemma.

Implementaremos um prompt baseado no estilo de prompt do ReAct (Reasoning and Acting). O ReAct define as ferramentas disponíveis e um formato específico para interação. Essa estrutura permite que o Gemma interaja com ciclos de Pensamento (raciocínio), Ação (utilizando ferramentas) e Observação (analisando a saída).

AI Assistant : Getting Time in Google AI Studio

As you can see, Gemma is attempting to use the get_current_time() function for both Tokyo and Paris. A Gemma model cannot simply execute on its own. To make this operational, you’ll need to run the generated code yourself or as part of your system. Without it, you can still proceed and observe Gemma’s response, similar to the one provided below.

Gemma attempting to use `get_current_time` function for both Tokyo and Paris in Google AI Studio

Awesome! Now you’ve witnessed Gemma’s function calling in action. This function calling ability allows it to execute operations autonomously in the background, executing tasks without requiring direct user interaction.

Let’s get our hands dirty with the actual demo, building a History AI Agent!

Demo Setup

All the prompts below are in the "Agentic AI with Gemma 2" notebook in Gemma's Cookbook. One difference when using Gemma in Google AI Studio versus directly with Python on Colab is that you must use a specific format like <start_of_turn> to give instructions to Gemma. You can learn more about this from the official docs.

Let’s imagine a fictional game world where AI agents craft dynamic content.

These agents, designed with specific objectives, can generate in-game content like books, poems, and songs, in response to a player choice or significant events within the game’s narrative.

A key feature of these AI agents is their ability to break down complex goals into smaller actionable steps. They can analyze different approaches, evaluate potential outcomes, and adapt their plans based on new information.

Where Agentic AI truly shines is that they’re not just passively spitting out information. They can interact with digital (and potentially physical) environments, execute tasks, and make decisions autonomously to achieve their programmed objectives.

So, how does it work?

Here’s an example ReAct style prompt designed for an AI agent that generates in-game content, with the capability to use function calls to retrieve historical information.

<start_of_turn>user
You are an AI Historian in a game. Your goal is to create books, poems, and songs found in the game world so that the player's choices meaningfully impact the unfolding of events.
 
You have access to the following tools:
 
* `get_historical_events(year, location=None, keyword=None)`: Retrieves a list of historical events within a specific year.
* `get_person_info(name)`: Retrieves information about a historical figure.
* `get_location_info(location_name)`: Retrieves information about a location.
 
Use the following multi-step conversation:
 
Thought: I need to do something...
Action: I should use the tool `tool_name` with input `tool_input`
 
Wait user to get the result of the tool is `tool_output`
 
And finally answer the Content of books, poems, or songs.

Markdown

Let’s try to write a book. See the example outputs below:

Zero-shot prompting

As you can see, Gemma may struggle with function calling due to a lack of training in that area.

To address this limitation, we can employ "One-shot prompting", a form of in-context learning, where demonstrations are embedded within the prompt. This example will serve as a guide for Gemma, allowing it to understand the intended task and improve its performance through contextual learning.

One-Shot Prompting

(Note: the green section is a provided example, the actual prompt comes after it)

Notably, the model performs better since Action contains the correct input.

Few-shot prompting

For more complex tasks, use "Few-shot prompting". It works by providing a small set of examples (usually 2-5, but sometimes more) that demonstrate the desired input-output relationship, allowing the model to grasp the underlying pattern.

Now, we received a function name get_person_info and parameter values "name: Anya, the Rebel Leader", the game must connect to an API and call the function. We will use a synthetic response payload for this API interaction.

Agentic-AI-with-Gemma-few-shot-prompting-example

Note that the agent used the provided information to create a book about Eldoria's Rebel Leader.

The Future is Agentic

We’re still in the early stages of Agentic AI development, but the progress is rapid. As these systems become more sophisticated, we can expect them to play an increasingly significant role in our lives.

Here are some potential applications, focused primarily on gaming:

Lifelike NPCs: NPCs will become more believable, exhibiting unique personalities and adapting to player interactions.
Dynamic Stories: Games will offer dynamically generated stories and quests, ensuring lasting replayability.
Efficient Development: AI can streamline game testing, leading to higher quality and faster development cycles.

But with implications beyond:

GUI Automation: Models can be used to interact with graphical user interfaces directly within a web browser.
Mathematical Tool Integration: AI can utilize tools like calculators to overcome limitations in performing complex calculations.
Contextual Knowledge Retrieval: AI can decide when it needs to query external knowledge sources (as in RAG systems).

Next steps

The era of passive, reactive AI is gradually giving way to a future where AI is proactive, goal-oriented, and capable of independent action. This is the dawn of Agentic AI, and it's a future worth getting excited about.

The Gemma Cookbook repository is a place where various ideas like this come together. Contributions are always welcome. If you have a notebook that implements a new idea, please send us a Pull Request.

Thanks for reading and catch you in the next one.