Mastering Controlled Generation with Gemini 1.5: Schema Adherence for Developers

SEP 03, 2024
Lewis Liu Group Product Manager Gemini
Terry Koo Research Scientist Google Deepmind

Much of the current excitement surrounding generative AI focuses on the "what could be" — the potential experiments and breakthroughs with foundational models. But at Google, we have always been committed to providing a comprehensive AI ecosystem that helps you not only try the best-in-class models but also adopt them in practical applications.

As part of that journey, we introduced Controlled Generation for Gemini 1.5 Pro at Google I/O earlier this year, and we’ve been thrilled by the rapid adoption and positive feedback we've received. Today, we're taking another step forward by introducing Controlled Generation for Gemini 1.5 Flash and adding "enum" support. This equips developers with a robust tool to reliably generate responses that adhere to a defined schema. Additionally, controlled generation is also automatically enabled in ANY mode when you use function calling on Gemini 1.5.


What is Controlled Generation and why is it important?

Think of controlled generation as providing a blueprint for the model's output. By defining a response schema, you dictate the precise format and structure of the AI's responses. Whether it's extracting entities in JSON format for seamless downstream processing or classifying news articles within your own taxonomy, controlled generation helps ensure consistency and reduces the need for time-consuming post-processing.

To fully integrate AI into software development, two things need to happen: a seamless handoff from data science and machine learning teams to application developers, and seamless integration of the model's output within existing systems. With controlled generation, you can:

  • Enable AI to produce readily usable, machine-readable data reducing the need for cumbersome post-processing and parsing.

  • Generate outputs in formats like JSON, making your AI a first-class citizen in the API economy. It can seamlessly plug into existing workflows.

  • Inject a dose of predictability into AI outputs, reliably anticipating the format and structure of the data your AI model produces.

In the words of Chris Curro, principal machine learning engineer at The Estée Lauder Companies — one of our earliest testers — "We're designing complex reasoning workflows on top of Gemini 1.5 to build consumer and employee experiences that would otherwise be impossible. The developer-friendly nature of controlled generation has allowed our team to move rapidly and drive business value."

The feature is built on top of our recent advancement Google team has developed called controlled decoding. You can learn more about our underlying techniques in this paper.

In Gemini API and Vertex AI API, we introduce the concept of "response schema". A response schema acts as a template, dictating the elements, data types, and overall structure of the AI's output. The schema is built based on OpenAPI 3.0 schema definition, so you always know you are building things on an open and compatible standard. By including a response schema with your prompt, you instruct the model to adhere to your defined rules, resulting in predictable and structured results.


Google secret sauce

  • Gemini’s controlled generation adds minimal latency to your existing API calls, even on the first API call

  • Gemini supports enum as a type, with more to come

  • Gemini enforces schemas does not require storing any of your data


How to get started

The controlled generation feature is available on both Gemini 1.5 Pro and Gemini 1.5 Flash on Google AI Studio and Vertex AI.


Example: Building a meal planning app with JSON schema

Imagine you are building an app to suggest recipes for different scenarios. The recipes must be generated in structured format to be easily ingested by the app and visually presented to the user. The following example illustrates how controlled generation can be used to generate a set of recipes for a multi-course meal.

from vertexai.generative_models import GenerationConfig, GenerativeModel
vertexai.init(project=project_id, location="us-central1")

response_schema = {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "course": {
                "type": "string",
                "enum": [
                    "appetizer",
                    "salad",
                    "soup",
                    "main",
                    "dessert",
                ],
            },
            "name": {
                "type": "string",
            },
            "ingredients": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "unit": {
                            "type": "string",
                            "enum": [
                                "count",
                                "cup",
                                "tablespoon",
                                "teaspoon",
                                "pound",
                                "ounce",
                            ],
                        },
                        "amount": {
                            "type": "number",
                        },
                        "name": {
                            "type": "string",
                        },
                    },
                    "required": ["name"],
                },
            },
            "steps": {
                "type": "array",
                "items": {
                    "type": "string",
                },
            },
        },
        "required": ["course", "name"],
    },
}

prompt = (
    "Some friends are in town and I want to host them for dinner.  "
    "Can you plan a three-course meal for me?  "
    "I'm not an experienced cook so I need simple recipes."
)

model = GenerativeModel("gemini-1.5-pro-001")

response = model.generate_content(
    prompt,
    generation_config=GenerationConfig(
        response_mime_type="application/json",
        response_schema=response_schema
    ),
)

The output of this model really confirms the schema specified in the request body, and can be directly used by the application.

[
    {
        "course": "appetizer",
        "name": "Caprese Skewers",
        "ingredients": [
            {
                "name": "cherry tomatoes",
                "amount": 1,
                "unit": "cup"
            },
            {
                "name": "fresh mozzarella balls",
                "amount": 1,
                "unit": "cup"
            },
            {
                "name": "fresh basil leaves",
                "amount": 12,
                "unit": "count"
            },
            {
                "name": "olive oil",
                "amount": 2,
                "unit": "tablespoon"
            },
            {
                "name": "balsamic glaze",
                "amount": 2,
                "unit": "tablespoon"
            },
            {
                "name": "salt",
                "amount": 1,
                "unit": "teaspoon"
            },
            {
                "name": "black pepper",
                "amount": 1,
                "unit": "teaspoon"
            }
        ],
        "steps": [
            "Thread cherry tomatoes, mozzarella balls, and basil leaves onto skewers.",
            "Drizzle with olive oil and balsamic glaze.",
            "Season with salt and pepper to taste."
        ]
    },
    {
        "course": "main",
        "name": "One-Pan Lemon Herb Salmon",
        "ingredients": [
            {
                "name": "salmon fillets",
                "amount": 4,
                "unit": "count"
            },
            {
                "name": "asparagus",
                "amount": 1,
                "unit": "pound"
            },
            {
                "name": "cherry tomatoes",
                "amount": 1,
                "unit": "cup"
            },
            {
                "name": "lemon",
                "amount": 1,
                "unit": "count"
            },
            {
                "name": "dried oregano",
                "amount": 1,
                "unit": "teaspoon"
            },
            {
                "name": "dried thyme",
                "amount": 1,
                "unit": "teaspoon"
            },
            {
                "name": "salt",
                "amount": 1,
                "unit": "teaspoon"
            },
            {
                "name": "black pepper",
                "amount": 1,
                "unit": "teaspoon"
            },
            {
                "name": "olive oil",
                "amount": 2,
                "unit": "tablespoon"
            }
        ],
        "steps": [
            "Preheat oven to 400 degrees F (200 degrees C).",
            "Line a baking sheet with parchment paper.",
            "Place salmon fillets on one side of the baking sheet and spread asparagus and cherry tomatoes on the other side.",
            "Squeeze lemon juice over the salmon and vegetables.",
            "Sprinkle with oregano, thyme, salt, and pepper.",
            "Drizzle with olive oil.",
            "Bake for 15-20 minutes, or until salmon is cooked through and vegetables are tender."
        ]
    },
    {
        "course": "dessert",
        "name": "Fruit Salad with Honey Yogurt",
        "ingredients": [
            {
                "name": "strawberries",
                "amount": 1,
                "unit": "cup"
            },
            {
                "name": "blueberries",
                "amount": 1,
                "unit": "cup"
            },
            {
                "name": "raspberries",
                "amount": 1,
                "unit": "cup"
            },
            {
                "name": "greek yogurt",
                "amount": 1,
                "unit": "cup"
            },
            {
                "name": "honey",
                "amount": 2,
                "unit": "tablespoon"
            }
        ],
        "steps": [
            "In a large bowl, combine strawberries, blueberries, and raspberries.",
            "In a separate bowl, mix together greek yogurt and honey.",
            "Serve fruit salad with a dollop of honey yogurt."
        ]
    }
]

Classify product condition with Enum schema

To constrain the model output in a set of predefined values, you can use “text/x.enum”.

import vertexai
from vertexai.generative_models import GenerationConfig, GenerativeModel

vertexai.init(project=project_id, location="us-central1")
model = GenerativeModel("gemini-1.5-flash-001")

response_schema = {
     "type": "STRING",
     "enum": ["new in package", "like new", "gently used", "used", "damaged", "soiled"]
}

prompt = [
     "Item description: The item is a long winter coat that has many tears all around the seams and is falling apart. It has large questionable stains on it."
]


response = model.generate_content(
     prompt,
     generation_config=GenerationConfig(
         response_mime_type="text/x.enum", response_schema=response_schema
     ),
)
print(response.candidates[0])

The model output contains the simple classification of the product as “damaged”.

content {
  role: "model"
  parts {
    text: "damaged"
  }
}

Limitations

  • Controlled generation supports a subset of OpenAPI3.0 schema.

  • The output content still depends on model capability to reason and extract. Using controlled generation enforces output format, but not the actual response

  • If the prompt has insufficient information for a required field, controlled generation may output a response based on the data it was trained on. Setting the nullable to True on the field can mitigate this limitation.


Summary

With controlled generation, you now have a robust tool to generate responses that adhere to a defined schema. You can apply it to many of your existing workflows to make it more reliable and predictive. We’re committed to providing developers with easy-to-use API features to better steer and control model behavior. Controlled Generation is just a start.

To get started with this feature, you can read more from Google AI Studio or Vertex AI documentation page.