Today, we’re excited to introduce a batch mode in the Gemini API, a new asynchronous endpoint designed specifically for high-throughput, non-latency-critical workloads. The Gemini API Batch Mode allows you to submit large jobs, offload the scheduling and processing, and retrieve your results within 24 hours—all at a 50% discount compared to our synchronous APIs.
Batch Mode is the perfect tool for any task where you have your data ready upfront and don’t need an immediate response. By separating these large jobs from your real-time traffic, you unlock three key benefits:
We’ve designed the API to be simple and intuitive. You package all your requests into a single file, submit it, and retrieve your results once the job is complete. Here are some ways developers are leveraging Batch Mode for tasks today:
You can start using Batch Mode today with the Google GenAI Python SDK:
# Create a JSONL that contains these lines:
# {"key": "request_1", "request": {"contents": [{"parts": [{"text": "Explain how AI works in a few words"}]}]}},
# {"key": "request_2", "request": {"contents": [{"parts": [{"text": "Explain how quantum computing works in a few words"}]}]}}
uploaded_batch_requests = client.files.upload(file="batch_requests.json")
batch_job = client.batches.create(
model="gemini-2.5-flash",
src=uploaded_batch_requests.name,
config={
'display_name': "batch_job-1",
},
)
print(f"Created batch job: {batch_job.name}")
# Wait for up to 24 hours
if batch_job.state.name == 'JOB_STATE_SUCCEEDED':
result_file_name = batch_job.dest.file_name
file_content_bytes = client.files.download(file=result_file_name)
file_content = file_content_bytes.decode('utf-8')
for line in file_content.splitlines():
print(line)
To learn more, check out the official documentation and pricing pages.
We're rolling out Batch Mode for the Gemini API today and tomorrow to all users. This is just the start for batch processing, and we're actively working on expanding its capabilities. Stay tuned for more powerful and flexible options!