Achieve real-time interaction: Build with the Live API

23 DE ABRIL DE 2025

Ivan Solovyev Product Manager

Shrestha Basu Mallick Group Product Manager Gemini API

The Live API equips developers with the essential tools to craft applications and intelligent agents capable of processing streaming audio, video, and text with incredibly low latency. This speed is paramount for creating truly interactive experiences, opening doors for customer support solutions, educational platforms, and real-time monitoring services.

Link to Youtube Video (visible only when JS is disabled)

Recently we announced the preview launch of the Live API for Gemini models – a significant step forward in enabling developers to build robust and scalable real-time applications. Try the latest features now using the Gemini API in Google AI Studio and in Vertex AI.

What's new in the Live API

Since our experimental launch in December, we've been listening closely to your feedback and have incorporated new features and capabilities to make the Live API production ready. Find full details in the Live API documentation:

Enhanced session management & reliability

Longer sessions via context compression: Enable extended interactions beyond previous time limits. Configure context window compression with a sliding window mechanism to automatically manage context length, preventing abrupt terminations due to context limits.

Session resumption: Keep sessions alive across temporary network disruptions. The Live API now supports server-side session state storage (for up to 24 hours) and provides handles (session_resumption) to reconnect and resume where you left off.

Graceful disconnect notification: Receive a GoAway server message indicating when a connection is about to close, allowing for graceful handling before termination.

Configurable turn coverage: Choose whether the Live API processes all audio and video input continuously or only captures it when the end-user is detected speaking.

Configurable media resolution: Optimize for quality or token usage by selecting the resolution for input media.

More control over interaction dynamics

Configurable voice activity detection (VAD): Choose sensitivity levels or disable automatic VAD entirely and use new client events (activityStart, activityEnd) for manual turn control.

Configurable interruption handling: Decide whether user input should interrupt the model's response.

Flexible session settings: Modify system instruction and other setup configurations at any time during the session.

Richer output & features

Expanded voice & language options: Choose from two new voices and 30 new languages for audio output. The output language is now configurable within speechConfig.

Text streaming: Receive text responses incrementally as they are generated, enabling faster display to the user.

Token usage reporting: Gain insights into usage with detailed token counts provided in the usageMetadata field of server messages, broken down by modality and prompt/response phases.

See the Live API in action: real-world applications

To inspire your next project, we're showcasing developers who are already leveraging the power of the Live API in their applications:

Daily.co

Daily integrates Live API support into the Pipecat Open Source SDKs for Web, Android, iOS and C++.

By using the power of the Live API, Pipecat Daily has created a voice-based word guessing game – Word Wrangler. Test your description skills in this AI-powered twist on classic word games and see how you can build one for yourself!

LiveKit

LiveKit integrates Live API support into LiveKit Agents. This framework for building voice AI agents provides a fully open-source platform for creating server-side agentic applications.

^"^{Until the Live API, no other LLM offered a developer interface that could directly ingest streaming video.”
–} ^{Russell d’Sa, CEO}

Check out their demo where they built an AI copilot that can browse the internet alongside you while sharing thoughts about what it can see in real-time.

Bubba.ai

Hey Bubba is an agentic, voice-first AI application specifically developed for truck drivers. Utilizing the Live API, it enables seamless, multi-language voice communication, allowing drivers to operate hands-free. Key functionalities include:

Searching for freight loads and providing details.

Initiating calls to brokers/shippers.

Negotiating freight rates based on market data.

Booking loads and verifying rate confirmations.

Finding and booking truck parking, including calling hotels to confirm availability.

Scheduling appointments with shippers and receivers.

The Live API powers both driver interaction (leveraging function calling and context caching for queries like future pickups) and Bubba's ability to interact during phone calls for negotiation and booking. This makes Hey Bubba a comprehensive AI tool for the largest and most diverse job sector in the USA.

Link to Youtube Video (visible only when JS is disabled)