Más allá del chatbot: IA agéntica con Gemma

13 DE FEBRERO DE 2025

Ju-yeong Ji Sr. Technical Consultant Gen AI – AI Studio

Gemma es una familia de modelos de código abierto ligeros y generativos de inteligencia artificial (IA), creados a partir de la misma investigación y tecnología utilizadas para crear los modelos Gemini. En una entrada de blog el año pasado, mostramos la creación de un juego de aventura basado en texto utilizando Gemma. En esta entrada, aprenderás a usar Gemma con una forma de IA llamada IA agéntica, que ofrece una forma diferente de usar modelos de lenguaje grandes (LLM).

Hoy en día, las IA más comunes son reactivas. Responden a comandos específicos, como un altavoz inteligente que reproduce música cuando se le pide. Son útiles, pero solo pueden hacer lo que se les dice.

En cambio, la IA agéntica es proactiva y autónoma. Toma sus propias decisiones para lograr objetivos. Una característica clave es el uso de herramientas externas, como motores de búsqueda, software especializado y otros programas, para obtener información más allá de su base de conocimiento inherente. Así, la IA agéntica trabaja y resuelve problemas con gran independencia y efectividad.

Aquí, proporcionaremos una guía práctica para desarrollar un sistema de IA agéntica basado en Gemma 2, en la que cubriremos conceptos técnicos clave como “llamada a función”, “ReAct” e “indicaciones de pocos disparos”. Este sistema de IA servirá como un generador dinámico de conocimientos en juegos de ficción, para expandir activamente su historia y ofrecer un panorama narrativo distinto y en constante evolución para los jugadores.

Cerrar la brecha

Antes de sumergirnos en la codificación, dediquémonos a comprender las capacidades de IA agéntica de Gemma. Puedes experimentar directamente con ellas a través de Google AI Studio. Google AI Studio ofrece varios modelos de Gemma 2. Recomendamos el modelo 27B para obtener el mejor rendimiento, pero también puedes usar un modelo más pequeño, como 2B, como verás ver a continuación. En este ejemplo, le decimos a Gemma que hay una función get_current_time() y le pedimos que nos diga la hora en Tokio y París.

This result shows that Gemma 2 does not suggest calling the get_current_time() function. This model capability is called "Function Calling", which is a key feature for enabling AI to interact with external systems and APIs to retrieve data.

Las capacidades de llamada a funciones integradas de Gemma son limitadas, lo que restringe su capacidad para actuar como agente. Sin embargo, sus sólidas capacidades de seguimiento de instrucciones se pueden utilizar para compensar la funcionalidad faltante. Veamos cómo aprovechar estas capacidades para ampliar la funcionalidad de Gemma.

Implementaremos una indicación basada en el estilo de indicación de ReAct (razonamiento y acción). ReAct define las herramientas disponibles y el formato específico para la interacción. Esta estructura permite a Gemma participar en ciclos de pensamiento (razonamiento), acción (utilizar herramientas) y observación (analizar el resultado).

AI Assistant : Getting Time in Google AI Studio

As you can see, Gemma is attempting to use the get_current_time() function for both Tokyo and Paris. A Gemma model cannot simply execute on its own. To make this operational, you’ll need to run the generated code yourself or as part of your system. Without it, you can still proceed and observe Gemma’s response, similar to the one provided below.

Gemma attempting to use `get_current_time` function for both Tokyo and Paris in Google AI Studio

Awesome! Now you’ve witnessed Gemma’s function calling in action. This function calling ability allows it to execute operations autonomously in the background, executing tasks without requiring direct user interaction.

Let’s get our hands dirty with the actual demo, building a History AI Agent!

Demo Setup

All the prompts below are in the "Agentic AI with Gemma 2" notebook in Gemma's Cookbook. One difference when using Gemma in Google AI Studio versus directly with Python on Colab is that you must use a specific format like <start_of_turn> to give instructions to Gemma. You can learn more about this from the official docs.

Let’s imagine a fictional game world where AI agents craft dynamic content.

These agents, designed with specific objectives, can generate in-game content like books, poems, and songs, in response to a player choice or significant events within the game’s narrative.

A key feature of these AI agents is their ability to break down complex goals into smaller actionable steps. They can analyze different approaches, evaluate potential outcomes, and adapt their plans based on new information.

Where Agentic AI truly shines is that they’re not just passively spitting out information. They can interact with digital (and potentially physical) environments, execute tasks, and make decisions autonomously to achieve their programmed objectives.

So, how does it work?

Here’s an example ReAct style prompt designed for an AI agent that generates in-game content, with the capability to use function calls to retrieve historical information.

<start_of_turn>user
You are an AI Historian in a game. Your goal is to create books, poems, and songs found in the game world so that the player's choices meaningfully impact the unfolding of events.
 
You have access to the following tools:
 
* `get_historical_events(year, location=None, keyword=None)`: Retrieves a list of historical events within a specific year.
* `get_person_info(name)`: Retrieves information about a historical figure.
* `get_location_info(location_name)`: Retrieves information about a location.
 
Use the following multi-step conversation:
 
Thought: I need to do something...
Action: I should use the tool `tool_name` with input `tool_input`
 
Wait user to get the result of the tool is `tool_output`
 
And finally answer the Content of books, poems, or songs.

Markdown

Let’s try to write a book. See the example outputs below:

Zero-shot prompting

As you can see, Gemma may struggle with function calling due to a lack of training in that area.

To address this limitation, we can employ "One-shot prompting", a form of in-context learning, where demonstrations are embedded within the prompt. This example will serve as a guide for Gemma, allowing it to understand the intended task and improve its performance through contextual learning.

One-Shot Prompting

(Note: the green section is a provided example, the actual prompt comes after it)

Notably, the model performs better since Action contains the correct input.

Few-shot prompting

For more complex tasks, use "Few-shot prompting". It works by providing a small set of examples (usually 2-5, but sometimes more) that demonstrate the desired input-output relationship, allowing the model to grasp the underlying pattern.

Now, we received a function name get_person_info and parameter values "name: Anya, the Rebel Leader", the game must connect to an API and call the function. We will use a synthetic response payload for this API interaction.

Agentic-AI-with-Gemma-few-shot-prompting-example

Note that the agent used the provided information to create a book about Eldoria's Rebel Leader.

The Future is Agentic

We’re still in the early stages of Agentic AI development, but the progress is rapid. As these systems become more sophisticated, we can expect them to play an increasingly significant role in our lives.

Here are some potential applications, focused primarily on gaming:

Lifelike NPCs: NPCs will become more believable, exhibiting unique personalities and adapting to player interactions.
Dynamic Stories: Games will offer dynamically generated stories and quests, ensuring lasting replayability.
Efficient Development: AI can streamline game testing, leading to higher quality and faster development cycles.

But with implications beyond:

GUI Automation: Models can be used to interact with graphical user interfaces directly within a web browser.
Mathematical Tool Integration: AI can utilize tools like calculators to overcome limitations in performing complex calculations.
Contextual Knowledge Retrieval: AI can decide when it needs to query external knowledge sources (as in RAG systems).

Next steps

The era of passive, reactive AI is gradually giving way to a future where AI is proactive, goal-oriented, and capable of independent action. This is the dawn of Agentic AI, and it's a future worth getting excited about.

The Gemma Cookbook repository is a place where various ideas like this come together. Contributions are always welcome. If you have a notebook that implements a new idea, please send us a Pull Request.

Thanks for reading and catch you in the next one.