Batch Mode in the Gemini API: Process more for less

2025年7月7日
Lucia Loher Product Manager
Vishal Dharmadhikari Product Solutions Engineer

Gemini models are now available in Batch Mode


Today, we’re excited to introduce a batch mode in the Gemini API, a new asynchronous endpoint designed specifically for high-throughput, non-latency-critical workloads. The Gemini API Batch Mode allows you to submit large jobs, offload the scheduling and processing, and retrieve your results within 24 hours—all at a 50% discount compared to our synchronous APIs.


Process more for less

Batch Mode is the perfect tool for any task where you have your data ready upfront and don’t need an immediate response. By separating these large jobs from your real-time traffic, you unlock three key benefits:

  • Cost savings: Batch jobs are priced at 50% less than the standard rate for a given model

  • Higher throughput: Batch Mode has even higher rate limits

  • Easy API calls: No need to manage complex client-side queuing or retry logic. Available results are returned within a 24-hour window.


A simple workflow for large jobs

We’ve designed the API to be simple and intuitive. You package all your requests into a single file, submit it, and retrieve your results once the job is complete. Here are some ways developers are leveraging Batch Mode for tasks today:

  • Bulk content generation and processing: Specializing in deep video understanding, Reforged Labs uses Gemini 2.5 Pro to analyze and label vast quantities of video ads monthly. Implementing Batch Mode has revolutionized their operations by significantly cutting costs, accelerating client deliverables, and enabling the massive scalability needed for meaningful market insights.
Bulk content generation and processing
  • Model evaluations: Vals AI benchmarks foundation models on real-world use cases, including legal, finance, tax and healthcare. They’re using Batch Mode to submit large volumes of evaluation queries without being constrained by rate limits.
Model evaluations

Get started in just a few lines of code

You can start using Batch Mode today with the Google GenAI Python SDK:

# Create a JSONL that contains these lines:
# {"key": "request_1", "request": {"contents": [{"parts": [{"text": "Explain how AI works in a few words"}]}]}},
# {"key": "request_2", "request": {"contents": [{"parts": [{"text": "Explain how quantum computing works in a few words"}]}]}}

uploaded_batch_requests = client.files.upload(file="batch_requests.json")

batch_job = client.batches.create(
    model="gemini-2.5-flash",
    src=uploaded_batch_requests.name,
    config={
        'display_name': "batch_job-1",
    },
)

print(f"Created batch job: {batch_job.name}")

# Wait for up to 24 hours

if batch_job.state.name == 'JOB_STATE_SUCCEEDED':
    result_file_name = batch_job.dest.file_name
    file_content_bytes = client.files.download(file=result_file_name)
    file_content = file_content_bytes.decode('utf-8')

    for line in file_content.splitlines():
      print(line)
Python

To learn more, check out the official documentation and pricing pages.


We're rolling out Batch Mode for the Gemini API today and tomorrow to all users. This is just the start for batch processing, and we're actively working on expanding its capabilities. Stay tuned for more powerful and flexible options!