Gemini 1.5로 Controlled Generation 완전정복: 개발자를 위한 스키마 준수

2024년 9월 3일

Lewis Liu Group Product Manager Gemini

Terry Koo Research Scientist Google Deepmind

현재 생성형 AI와 관련된 기대는 상당 부분 "가능성"(기본 모델을 사용해 실현할 수 있는 잠재적 실험 및 획기적인 돌파구)에 집중되어 있습니다. 하지만 Google에서는 항상 개발자가 동급 최고의 모델을 사용해 볼 뿐만 아니라 실용적인 응용 분야에 모델을 도입하는 데 도움이 되는 종합적인 AI 생태계를 제공하는 데 전념해 왔습니다.

이 여정의 일환으로, 올해 초 Google I/O에서 Gemini 1.5 Pro용 Controlled Generation을 선보였으며 이 기능이 신속하게 도입되고 좋은 피드백이 쇄도하는 등 호응이 대단하여 무척 뿌듯했습니다. 오늘은 Gemini 1.5 Flash용 Controlled Generation을 소개하며, "enum" 지원도 추가하여 한 걸음 더 나아갔다는 사실을 알려드립니다. 이 기능을 사용하면 개발자가 정의된 스키마를 준수하는 응답을 안정적으로 생성하는 고성능 도구를 확보하게 됩니다. 또한, Controlled Generation은 Gemini 1.5에서 함수 호출을 사용할 때 모든 모드에서 자동으로 사용 설정됩니다.

Controlled Generation이란 무엇이고, 왜 중요할까요?

Controlled Generation은 모델의 출력을 위한 청사진을 제공하는 것으로 생각하면 좋습니다. 응답 스키마를 정의하면 AI 응답에 정확한 형식과 구조를 정해주는 것이 됩니다. Controlled Generation을 이용하면 JSON 형식으로 엔터티를 추출해 다운스트림 처리를 원활하게 하거나, 자체 분류 체계 내에서 새 문서를 분류하는 등의 작업에서 일관성을 보장하고 시간이 오래 걸리는 사후 처리를 줄여 주는 효과가 있습니다.

소프트웨어 개발에 AI를 완전히 통합하려면 두 가지가 전제되어야 합니다. 하나는 데이터 사이언스 팀과 머신러닝 팀이 애플리케이션 개발자에게 원활하게 핸드오프할 수 있어야 한다는 것이고, 다른 하나는 모델의 출력을 기존 시스템에 순조롭게 통합할 수 있어야 한다는 것입니다. Controlled Generation을 사용하면 다음과 같은 이점을 누릴 수 있습니다.

AI를 사용해 바로 사용할 수 있고, 컴퓨터가 읽을 수 있는 데이터를 생성하므로 번거로운 사후 처리와 구문 분석의 필요성이 줄어듭니다.

출력을 JSON과 같은 형식으로 생성하여 귀하의 AI를 API 생태계에서 완벽하게 호환되고 즉시 활용 가능한 핵심 요소로 만듭니다. 즉 기존 워크플로에 원활하게 통합할 수 있습니다.

AI 출력에 약간의 예측 가능성을 넣어 AI 모델이 도출할 데이터의 형식과 구조를 안정적으로 예상합니다.

초창기 테스터 중 하나인 The Estée Lauder Companies의 선임 머신러닝 엔지니어 Chris Curro 씨는 다음과 같이 밝혔습니다. "우리는 Gemini 1.5 기반의 복잡한 추론 워크플로를 설계하여 전에는 불가능했을 형태의 소비자 및 직원 경험을 빌드하고 있습니다. Controlled Generation은 개발자 친화적이라서 팀원들이 신속하게 일을 진행해 비즈니스 가치를 창출하는 데 큰 도움이 되었습니다."

이 기능은 Google 팀이 최근 개발한 제어형 디코딩(controlled decoding)을 기반으로 합니다. 기반이 되는 기법에 관한 자세한 정보는 이 글을 참조하세요.

Gemini API와 Vertex AI API에는 "응답 스키마"라는 개념을 도입했습니다. 응답 스키마는 일종의 템플릿 역할을 하여 AI 출력의 요소, 데이터 유형, 전반적인 구조를 정해 줍니다. 이 스키마는 OpenAPI 3.0 스키마 정의를 기반으로 하여 빌드되었으므로, 개방형이고 호환되는 표준에 따른 빌드가 보장됩니다. 프롬프트와 함께 응답 스키마를 포함하면 모델이 정의된 규칙을 준수하도록 지시할 수 있으므로, 예측 가능하고 구조화된 결과를 얻게 됩니다.

Google만의 비법

Gemini의 Controlled Generation은 기존 API 호출에 최소한의 지연만 추가(첫 API 호출도 마찬가지)

Gemini는 enum type을 지원하며, 향후 지원되는 type을 더 추가할 예정

Gemini에서 스키마를 적용할 때 데이터를 저장할 필요가 없음

시작 방법

Controlled Generation 기능은 Google AI Studio 및 Vertex AI에서 Gemini 1.5 Pro와 Gemini 1.5 Flash 둘 모두를 통해 사용할 수 있습니다.

예시: JSON 스키마를 사용해 식단 계획 앱 빌드

다양한 시나리오에 따라 레시피를 제안하는 앱을 빌드한다고 가정해 보겠습니다. 레시피는 앱에서 손쉽게 내부 데이터화하고 사용자에게 시각적으로 표시될 수 있도록 구조화된 형식으로 생성되어야 합니다. 다음 예시에서는 여러 코스로 구성된 식사의 레시피를 생성하는 데 Controlled Generation을 어떻게 사용할 수 있는지 나타내었습니다.

from vertexai.generative_models import GenerationConfig, GenerativeModel
vertexai.init(project=project_id, location="us-central1")
 
response_schema = {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "course": {
                "type": "string",
                "enum": [
                    "appetizer",
                    "salad",
                    "soup",
                    "main",
                    "dessert",
                ],
            },
            "name": {
                "type": "string",
            },
            "ingredients": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "unit": {
                            "type": "string",
                            "enum": [
                                "count",
                                "cup",
                                "tablespoon",
                                "teaspoon",
                                "pound",
                                "ounce",
                            ],
                        },
                        "amount": {
                            "type": "number",
                        },
                        "name": {
                            "type": "string",
                        },
                    },
                    "required": ["name"],
                },
            },
            "steps": {
                "type": "array",
                "items": {
                    "type": "string",
                },
            },
        },
        "required": ["course", "name"],
    },
}
 
prompt = (
    "Some friends are in town and I want to host them for dinner.  "
    "Can you plan a three-course meal for me?  "
    "I'm not an experienced cook so I need simple recipes."
)
 
model = GenerativeModel("gemini-1.5-pro-001")
 
response = model.generate_content(
    prompt,
    generation_config=GenerationConfig(
        response_mime_type="application/json",
        response_schema=response_schema
    ),
)

Python

이 모델의 출력을 보면 요청 본문에서 지정한 스키마가 확인되고, 이 출력을 애플리케이션에서 직접 사용할 수 있습니다.

[
    {
        "course": "appetizer",
        "name": "Caprese Skewers",
        "ingredients": [
            {
                "name": "cherry tomatoes",
                "amount": 1,
                "unit": "cup"
            },
            {
                "name": "fresh mozzarella balls",
                "amount": 1,
                "unit": "cup"
            },
            {
                "name": "fresh basil leaves",
                "amount": 12,
                "unit": "count"
            },
            {
                "name": "olive oil",
                "amount": 2,
                "unit": "tablespoon"
            },
            {
                "name": "balsamic glaze",
                "amount": 2,
                "unit": "tablespoon"
            },
            {
                "name": "salt",
                "amount": 1,
                "unit": "teaspoon"
            },
            {
                "name": "black pepper",
                "amount": 1,
                "unit": "teaspoon"
            }
        ],
        "steps": [
            "Thread cherry tomatoes, mozzarella balls, and basil leaves onto skewers.",
            "Drizzle with olive oil and balsamic glaze.",
            "Season with salt and pepper to taste."
        ]
    },
    {
        "course": "main",
        "name": "One-Pan Lemon Herb Salmon",
        "ingredients": [
            {
                "name": "salmon fillets",
                "amount": 4,
                "unit": "count"
            },
            {
                "name": "asparagus",
                "amount": 1,
                "unit": "pound"
            },
            {
                "name": "cherry tomatoes",
                "amount": 1,
                "unit": "cup"
            },
            {
                "name": "lemon",
                "amount": 1,
                "unit": "count"
            },
            {
                "name": "dried oregano",
                "amount": 1,
                "unit": "teaspoon"
            },
            {
                "name": "dried thyme",
                "amount": 1,
                "unit": "teaspoon"
            },
            {
                "name": "salt",
                "amount": 1,
                "unit": "teaspoon"
            },
            {
                "name": "black pepper",
                "amount": 1,
                "unit": "teaspoon"
            },
            {
                "name": "olive oil",
                "amount": 2,
                "unit": "tablespoon"
            }
        ],
        "steps": [
            "Preheat oven to 400 degrees F (200 degrees C).",
            "Line a baking sheet with parchment paper.",
            "Place salmon fillets on one side of the baking sheet and spread asparagus and cherry tomatoes on the other side.",
            "Squeeze lemon juice over the salmon and vegetables.",
            "Sprinkle with oregano, thyme, salt, and pepper.",
            "Drizzle with olive oil.",
            "Bake for 15-20 minutes, or until salmon is cooked through and vegetables are tender."
        ]
    },
    {
        "course": "dessert",
        "name": "Fruit Salad with Honey Yogurt",
        "ingredients": [
            {
                "name": "strawberries",
                "amount": 1,
                "unit": "cup"
            },
            {
                "name": "blueberries",
                "amount": 1,
                "unit": "cup"
            },
            {
                "name": "raspberries",
                "amount": 1,
                "unit": "cup"
            },
            {
                "name": "greek yogurt",
                "amount": 1,
                "unit": "cup"
            },
            {
                "name": "honey",
                "amount": 2,
                "unit": "tablespoon"
            }
        ],
        "steps": [
            "In a large bowl, combine strawberries, blueberries, and raspberries.",
            "In a separate bowl, mix together greek yogurt and honey.",
            "Serve fruit salad with a dollop of honey yogurt."
        ]
    }
]

Python

Enum 스키마를 사용하여 제품 상태 분류

일련의 사전 정의된 값을 사용해 모델 출력을 제한하려면, “text/x.enum”을 사용하면 됩니다.

import vertexai
from vertexai.generative_models import GenerationConfig, GenerativeModel
 
vertexai.init(project=project_id, location="us-central1")
model = GenerativeModel("gemini-1.5-flash-001")
 
response_schema = {
     "type": "STRING",
     "enum": ["new in package", "like new", "gently used", "used", "damaged", "soiled"]
}
 
prompt = [
     "Item description: The item is a long winter coat that has many tears all around the seams and is falling apart. It has large questionable stains on it."
]
 
 
response = model.generate_content(
     prompt,
     generation_config=GenerationConfig(
         response_mime_type="text/x.enum", response_schema=response_schema
     ),
)
print(response.candidates[0])

Python

모델 출력에 "damaged(훼손됨)"라는 단순한 제품 분류가 포함됩니다.

content {
  role: "model"
  parts {
    text: "damaged"
  }
}

Python

제한 사항

Controlled Generation은 OpenAPI 3.0 스키마의 일부를 지원합니다.

출력 내용은 여전히 모델의 추론 및 추출 역량에 좌우됩니다. Controlled Generation을 사용하면 출력 형식은 강제할 수 있지만, 실제 응답은 강제할 수 없습니다.

프롬프트에 필수 필드에 대한 정보가 부족하면, Controlled Generation은 자신이 훈련받은 데이터에 기반해 응답을 출력할 수 있습니다. 필드에서 nullable을 True로 설정하면 이 한계를 완화할 수 있습니다.

요약

Controlled Generation을 사용하면 정의된 스키마를 준수하는 응답을 생성할 고성능 도구를 완비하게 됩니다. 이를 기존 워크플로 대다수에 적용해 안정성과 예측 가능성을 높일 수 있습니다. Google에서는 개발자에게 모델 동작을 더 잘 조종하고 통제하도록 사용이 간편한 API 기능을 제공하고자 노력 중입니다. Controlled Generation은 그 시작에 불과합니다.

이 기능을 사용해 보려면 Google AI Studio 또는 Vertex AI 관련 문서에서 자세한 정보를 참조하시기 바랍니다.

게시 위치: