Human-to-human communication is naturally multimodal, involving a mix of spoken words, visual cues, and real-time adjustments. With the Multimodal Live API for Gemini we've achieved this same level of naturalness in human-computer interaction. Imagine AI conversations that feel more interactive, where you can use visual inputs and receive context-aware solutions in real-time, seamlessly blending text, audio, and video. The Multimodal Live API for Gemini 2.0 enables this type of interaction and is available in Google AI Studio and Gemini API. This technology allows you to build applications that respond to the world as it happens, leveraging real-time data.
The Multimodal Live API is a stateful API utilizing WebSockets to facilitate low-latency, server-to-server communication. This API supports tools such as function calling, code execution, search grounding, and the combination of multiple tools within a single request, enabling comprehensive responses without the need for multiple prompts. This allows developers to create more efficient and complex AI interactions.
Key features of the Multimodal Live API include:
The Multimodal Live API enables a variety of real-time, interactive applications. Here are a few examples of use cases where this API can be effectively applied:
To help you explore this new functionality and kick start your own exploration we've created a bunch of demo applications showcasing realtime streaming capabilities:
A starter web application for streaming mic, camera or screen input. A perfect base for your creativity:
Link to Youtube Video (visible only when JS is disabled)
Full code and a getting started guide available on Github: https://github.com/google-gemini/multimodal-live-api-web-console.
Chat with Gemini about the weather. Select a location and have a gemini powered character explaining the weather in that location. You can interrupt and ask a follow up question anytime.
Link to Youtube Video (visible only when JS is disabled)
Ready to dive in? Experiment with Multimodal Live Streaming directly in Google AI Studio for a hands-on experience. Or, for full control, grab the detailed documentation and code samples to start building with the API today.
We've also partnered with Daily, to provide a seamless integration via their pipecat framework, enabling you to add real-time capabilities to your apps effortlessly. Daily.co, creators of the pipecat framework, is a video and audio API platform that makes it easy for developers to add real-time video and audio streaming to their websites and apps. Check out Daily's integration guide to get started building.
We're excited to see your creations - share your feedback and the amazing applications you build with the new API!