As we move toward building more sophisticated AI agents, the limitations of the traditional request-response model—which inherently creates a stiff, turn-based interaction—become apparent. This paradigm is not naturally suited for high-concurrency, low-latency interactions, especially those involving continuous data streams like audio and video and multiple agents.
This post outlines the case for a real-time bidirectional streaming architecture as the next step for multi-agentic systems. We will analyze the primary engineering hurdles this "turnless" model introduces—from state and session management to performant I/O—and detail how the Agent Development Kit (ADK) is designed to address them through a streaming-native-first approach.
For years, the development of AI agents has been centered on the request-response communication pattern. While foundational, this paradigm suffers from critical architectural limitations that prevent truly interactive and intelligent experiences:
By shifting from turn-based transactions to a persistent, bidirectional stream, we unlock a new class of agent capabilities that feel more like a collaborative partner than a simple tool:
While the benefits are transformative, building a robust, real-time bidirectional multi-agentic application is not trivial. Developers must solve a new class of complex engineering problems that don't exist in a request-response world.
The alternative is a "live" agent paradigm built on persistent, bidirectional streaming. This allows for asynchronous data flow in both directions, enabling an agent to process voice or video and process data simultaneously.
To enable developers to build these new experiences, we engineered a bidi-streaming-native architecture in the open-source Agent Development Kit (ADK), which is grounded in these core architectures:
To handle continuous, multimodal inputs (text, audio/video blobs), ADK introduces a crucial abstraction: the LiveRequestQueue. This asyncio-based queue allows client applications to seamlessly enqueue various data types as they arrive. The agent's asynchronous runner (run_live) consumes from this queue, enabling the model to process data in near real-time without waiting for a formal "turn end". The asynchronous runner also responds with the real-time streams, in the form of events, whenever it’s available.
class LiveRequestQueue:
"""Queue used to send LiveRequest in a live(bidirectional streaming) way."""
def close(self):
self._queue.put_nowait(LiveRequest(close=True))
def send_content(self, content: types.Content):
self._queue.put_nowait(LiveRequest(content=content))
def send_realtime(self, blob: types.Blob):
self._queue.put_nowait(LiveRequest(blob=blob))
def send_activity_start(self):
"""Sends an activity start signal to mark the beginning of user input."""
self._queue.put_nowait(LiveRequest(activity_start=types.ActivityStart()))
def send_activity_end(self):
"""Sends an activity end signal to mark the end of user input."""
self._queue.put_nowait(LiveRequest(activity_end=types.ActivityEnd()))
def send(self, req: LiveRequest):
self._queue.put_nowait(req)
async def get(self) -> LiveRequest:
return await self._queue.get()
# Agent runner consumes from live_request_queue and streams out events
async for event in agent.run_live(session=my_session, live_request_queue=queue):
# process agent's streaming response
pass
Streaming interactions in multi-agent demand robust context management which are handled by ADK sessions. ADK sessions persist throughout the live interaction, holding not just history, but also tool calls, tool responses and various other system signals.
A key challenge is segmenting continuous streams (like audio) into discrete events for logging and state management. ADK's approach involves:
This stateful session becomes the "briefing packet" for multi-agent collaboration. When a handoff occurs (e.g., from a triage agent to a specialist), the entire session context is transferred, allowing the next agent to pick up seamlessly without requiring the user to repeat information. This enables complex, multi-step workflows to feel like a single, intelligent conversation.
In a real-world bidi-streaming agentic application, a single run_live() call is insufficient. Developers need hooks into the agent's behaviours for customizations. ADK implements callbacks:
These callbacks enable dynamic control, such as logging tool statuses, real-time content moderation, or even injecting new information into the agents.
Traditional tools follow a request-response model and can’t interact with I/O streams produced by models in real-time. ADK enables "streaming tools" – tools defined as asynchronous generators (AsyncGenerator). These tools can:
# Conceptual Example of a Streaming Tool
async def monitor_stock_price(symbol: str, alert_price: float) -> AsyncGenerator[str, None]:
while True:
current_price = await fetch_price(symbol)
if current_price >= alert_price:
yield f"Alert: {symbol} reached {current_price}"
break
yield f"Current price: {current_price}, waiting..."
await asyncio.sleep(60)
This allows agents to perform tasks like real-time data analysis, continuous monitoring, or processing large media streams, providing feedback in the background throughout the uninterrupted interaction with users.
This architecture is a starting point for deep exploration and research. To further improve real-time, interactive AI, we are focusing on several key frontiers. Performance is paramount. We are committed to improving the startup and agent transfer times to make multi-agent interactions feel instantaneous and seamless. Besides, we aim to provide developers with even deeper control over the agent's lifecycle by introducing richer callback types, such as before-model-callback and after-model-callback.