챗봇을 넘어: Gemma를 활용한 Agentic AI

2025년 2월 13일

Ju-yeong Ji Sr. Technical Consultant Gen AI – AI Studio

Gemma는 Gemini 모델을 만드는 데 사용된 것과 동일한 연구와 기술로 개발된 경량의 생성형 인공지능(AI) 개방형 모델 제품군입니다. 저희는 작년 블로그 게시물에서 Gemma를 사용한 텍스트 기반 어드벤처 게임 제작을 선보였습니다. 이 블로그 게시물에서는 대형 언어 모델(LLM) 사용에 있어 다른 방법을 제공하는 Agentic AI라 불리는 AI 형식으로 Gemma를 사용하는 방법을 알려드리겠습니다.

오늘날 가장 일반적인 AI는 반응형 AI입니다. 이러한 반응형 AI는 요청에 따라 음악을 재생하는 스마트 스피커처럼 특정 명령에 응답합니다. 유용하지만 시키는 일만 할 수 있습니다.

반면, Agentic AI는 능동적이고 자율적입니다. 이 AI는 목표를 달성하기 위해 스스로 결정을 내립니다. 핵심 기능은 검색 엔진, 전문 소프트웨어, 기타 프로그램 같은 외부 도구를 사용하여 내재된 기술 자료 이외의 정보까지 확보하는 것입니다. 이를 통해 Agentic AI는 매우 독립적이고 효과적으로 작동하고 문제를 해결할 수 있습니다.

이 게시물은 "Function Calling"(함수 호출), "ReAct", "Few-shot prompting"(퓨샷 프롬프팅) 등의 핵심적인 기술 개념에 대해 알아보면서 Gemma 2 기반 Agentic AI 시스템 구축을 위한 실용적인 가이드를 제공합니다. 이 AI 시스템은 가상의 게임을 만들기 위한 역동적인 이야기 생성기 역할을 하며 적극적으로 역사를 확장하고 플레이어에게 독창적이고 끊임없이 진화하는 내러티브 환경을 제공합니다.

격차 해소

코딩을 자세히 살펴보기 전에 Gemma의 에이전틱 AI 기능에 대해 알아보겠습니다. Google AI Studio를 통해 이를 직접 실험해 보실 수 있습니다. Google AI Studio는 여러 Gemma 2 모델을 제공합니다. 최고 성능을 위해서는 27B 모델을 사용하는 것을 추천하지만, 아래에서 볼 수 있듯이 2B와 같은 소규모 모델도 사용할 수 있습니다. 이 예에서는 Gemma에 get_current_time() 함수가 있음을 알려주면서 도쿄와 파리의 시간을 물어봅니다.

This result shows that Gemma 2 does not suggest calling the get_current_time() function. This model capability is called "Function Calling", which is a key feature for enabling AI to interact with external systems and APIs to retrieve data.

Gemma에서 기본 제공되는 함수 호출 기능은 제한되어 있어 에이전트로서 작동하는 능력이 제한적입니다. 그러나 지시를 따르는 강력한 기능을 사용하여 이처럼 부족한 기능을 보완할 수 있습니다. 이를 활용하여 Gemma의 기능을 어떻게 확장할 수 있을지 알아보겠습니다.

ReAct(Reasoning and Acting) 프롬프팅 스타일을 기반으로 하는 프롬프트를 구현할 예정입니다. ReAct는 상호작용을 위한 특정 형식과 사용 가능한 도구를 정의합니다. Gemma는 이 구조를 통해 생각(추론), 행동(도구 활용), 관찰(출력 분석)로 이루어진 사이클에 관여할 수 있습니다.

AI Assistant : Getting Time in Google AI Studio

As you can see, Gemma is attempting to use the get_current_time() function for both Tokyo and Paris. A Gemma model cannot simply execute on its own. To make this operational, you’ll need to run the generated code yourself or as part of your system. Without it, you can still proceed and observe Gemma’s response, similar to the one provided below.

Gemma attempting to use `get_current_time` function for both Tokyo and Paris in Google AI Studio

Awesome! Now you’ve witnessed Gemma’s function calling in action. This function calling ability allows it to execute operations autonomously in the background, executing tasks without requiring direct user interaction.

Let’s get our hands dirty with the actual demo, building a History AI Agent!

Demo Setup

All the prompts below are in the "Agentic AI with Gemma 2" notebook in Gemma's Cookbook. One difference when using Gemma in Google AI Studio versus directly with Python on Colab is that you must use a specific format like <start_of_turn> to give instructions to Gemma. You can learn more about this from the official docs.

Let’s imagine a fictional game world where AI agents craft dynamic content.

These agents, designed with specific objectives, can generate in-game content like books, poems, and songs, in response to a player choice or significant events within the game’s narrative.

A key feature of these AI agents is their ability to break down complex goals into smaller actionable steps. They can analyze different approaches, evaluate potential outcomes, and adapt their plans based on new information.

Where Agentic AI truly shines is that they’re not just passively spitting out information. They can interact with digital (and potentially physical) environments, execute tasks, and make decisions autonomously to achieve their programmed objectives.

So, how does it work?

Here’s an example ReAct style prompt designed for an AI agent that generates in-game content, with the capability to use function calls to retrieve historical information.

<start_of_turn>user
You are an AI Historian in a game. Your goal is to create books, poems, and songs found in the game world so that the player's choices meaningfully impact the unfolding of events.
 
You have access to the following tools:
 
* `get_historical_events(year, location=None, keyword=None)`: Retrieves a list of historical events within a specific year.
* `get_person_info(name)`: Retrieves information about a historical figure.
* `get_location_info(location_name)`: Retrieves information about a location.
 
Use the following multi-step conversation:
 
Thought: I need to do something...
Action: I should use the tool `tool_name` with input `tool_input`
 
Wait user to get the result of the tool is `tool_output`
 
And finally answer the Content of books, poems, or songs.

Markdown

Let’s try to write a book. See the example outputs below:

Zero-shot prompting

As you can see, Gemma may struggle with function calling due to a lack of training in that area.

To address this limitation, we can employ "One-shot prompting", a form of in-context learning, where demonstrations are embedded within the prompt. This example will serve as a guide for Gemma, allowing it to understand the intended task and improve its performance through contextual learning.

One-Shot Prompting

(Note: the green section is a provided example, the actual prompt comes after it)

Notably, the model performs better since Action contains the correct input.

Few-shot prompting

For more complex tasks, use "Few-shot prompting". It works by providing a small set of examples (usually 2-5, but sometimes more) that demonstrate the desired input-output relationship, allowing the model to grasp the underlying pattern.

Now, we received a function name get_person_info and parameter values "name: Anya, the Rebel Leader", the game must connect to an API and call the function. We will use a synthetic response payload for this API interaction.

Agentic-AI-with-Gemma-few-shot-prompting-example

Note that the agent used the provided information to create a book about Eldoria's Rebel Leader.

The Future is Agentic

We’re still in the early stages of Agentic AI development, but the progress is rapid. As these systems become more sophisticated, we can expect them to play an increasingly significant role in our lives.

Here are some potential applications, focused primarily on gaming:

Lifelike NPCs: NPCs will become more believable, exhibiting unique personalities and adapting to player interactions.
Dynamic Stories: Games will offer dynamically generated stories and quests, ensuring lasting replayability.
Efficient Development: AI can streamline game testing, leading to higher quality and faster development cycles.

But with implications beyond:

GUI Automation: Models can be used to interact with graphical user interfaces directly within a web browser.
Mathematical Tool Integration: AI can utilize tools like calculators to overcome limitations in performing complex calculations.
Contextual Knowledge Retrieval: AI can decide when it needs to query external knowledge sources (as in RAG systems).

Next steps

The era of passive, reactive AI is gradually giving way to a future where AI is proactive, goal-oriented, and capable of independent action. This is the dawn of Agentic AI, and it's a future worth getting excited about.

The Gemma Cookbook repository is a place where various ideas like this come together. Contributions are always welcome. If you have a notebook that implements a new idea, please send us a Pull Request.

Thanks for reading and catch you in the next one.