Beyond Request-Response: Architecting Real-time Bidirectional Streaming Multi-agent System

2025년 10월 30일
Hangfei Lin Tech Lead

As we move toward building more sophisticated AI agents, the limitations of the traditional request-response model—which inherently creates a stiff, turn-based interaction—become apparent. This paradigm is not naturally suited for high-concurrency, low-latency interactions, especially those involving continuous data streams like audio and video and multiple agents.

This post outlines the case for a real-time bidirectional streaming architecture as the next step for multi-agentic systems. We will analyze the primary engineering hurdles this "turnless" model introduces—from state and session management to performant I/O—and detail how the Agent Development Kit (ADK) is designed to address them through a streaming-native-first approach.

The architectural limits of request-response-based agents

For years, the development of AI agents has been centered on the request-response communication pattern. While foundational, this paradigm suffers from critical architectural limitations that prevent truly interactive and intelligent experiences:

  • Perceived latency: The agent must wait for the user's entire input before it can begin processing, creating an unnatural, turn-based delay that breaks the flow of conversation.
  • Disjointed tool integration: In the request-response model, invoking tools often disrupts the flow of interaction. While the tool execution itself can be asynchronous, the results are typically not seamlessly integrated back into the ongoing conversation. The user might receive an acknowledgment, but then must wait for a separate update or initiate a new request to see the outcome, making the experience feel segmented and less interactive.
  • Clumsy multimodality: Processing simultaneous streams like audio and video requires complex, brittle logic to stitch together separate inputs into what should be a single, unified experience.

The vision: the real-time bidi-streaming agent paradigm

By shifting from turn-based transactions to a persistent, bidirectional stream, we unlock a new class of agent capabilities that feel more like a collaborative partner than a simple tool:

  • True concurrency and interruptibility: In a streaming architecture, the agent can process information and act while the user is still providing input. This enables non-blocking interactions and crucial features like natural interruptibility (or "barge-in"), where the agent can instantly stop its current action to address a new user input.
  • Proactive assistance with streaming tools: Tools are no longer limited to a single request-and-response cycle. They can be redefined as persistent, background processes that stream information back to the user or agent over time.
  • Unified multimodal processing: A streaming architecture solves this by natively processing continuous, parallel streams as a single, unified context. This architectural approach is what unlocks true environmental and situational awareness, allowing the agent to react to its surroundings in real-time without manual synchronization.

Engineering challenges for real-time bidi-streaming multi-agent systems

While the benefits are transformative, building a robust, real-time bidirectional multi-agentic application is not trivial. Developers must solve a new class of complex engineering problems that don't exist in a request-response world.

  • Context management in a turnless world: the most fundamental challenge is that the concept of a "turn" disappears. In a continuous stream, developers must design new mechanisms to segment the stream into logical events for debugging, analysis, and resuming conversations. Developers must figure out how to store a continuous stream of context packaged and transferred to another agent when there's no clear "end of turn" signal to trigger the handoff.
  • The concurrency and performance problem: a streaming agent is a highly concurrent system that must process multiple asynchronous I/O streams with low latency. The architecture must gracefully handle simultaneous user inputs (e.g., voice and text), the LLM's streaming output (e.g., text and tool calls), data from multiple, long-running background tools that are also streaming results. This inherent concurrency becomes exponentially more complex in a multi-agent system.
  • Developer experience and extensibility: The underlying complexity of a streaming system must be hidden behind simple, powerful abstractions. A successful framework needs to provide an intuitive developer experience for common tasks. For example, developers need a simple way to define tools that can yield multiple results to users or models over time. Another example is that the system must be extensible, offering hooks and callbacks to allow developers to inject custom logic at critical points in the agent's lifecycle (e.g., before or after a tool is called).

The bidirectional streaming paradigm: an architectural deep dive with ADK

The alternative is a "live" agent paradigm built on persistent, bidirectional streaming. This allows for asynchronous data flow in both directions, enabling an agent to process voice or video and process data simultaneously.

unnamed

To enable developers to build these new experiences, we engineered a bidi-streaming-native architecture in the open-source Agent Development Kit (ADK), which is grounded in these core architectures:

1. Asynchronous real-time I/O management

To handle continuous, multimodal inputs (text, audio/video blobs), ADK introduces a crucial abstraction: the LiveRequestQueue. This asyncio-based queue allows client applications to seamlessly enqueue various data types as they arrive. The agent's asynchronous runner (run_live) consumes from this queue, enabling the model to process data in near real-time without waiting for a formal "turn end". The asynchronous runner also responds with the real-time streams, in the form of events, whenever it’s available.

class LiveRequestQueue:
  """Queue used to send LiveRequest in a live(bidirectional streaming) way."""

  def close(self):
    self._queue.put_nowait(LiveRequest(close=True))

  def send_content(self, content: types.Content):
    self._queue.put_nowait(LiveRequest(content=content))

  def send_realtime(self, blob: types.Blob):
    self._queue.put_nowait(LiveRequest(blob=blob))

  def send_activity_start(self):
    """Sends an activity start signal to mark the beginning of user input."""
    self._queue.put_nowait(LiveRequest(activity_start=types.ActivityStart()))

  def send_activity_end(self):
    """Sends an activity end signal to mark the end of user input."""
    self._queue.put_nowait(LiveRequest(activity_end=types.ActivityEnd()))

  def send(self, req: LiveRequest):
    self._queue.put_nowait(req)

  async def get(self) -> LiveRequest:
    return await self._queue.get()

# Agent runner consumes from live_request_queue and streams out events
async for event in agent.run_live(session=my_session, live_request_queue=queue):
    # process agent's streaming response
    pass
Python

2. Stateful, transferable sessions for multi-agent

Streaming interactions in multi-agent demand robust context management which are handled by ADK sessions. ADK sessions persist throughout the live interaction, holding not just history, but also tool calls, tool responses and various other system signals.

A key challenge is segmenting continuous streams (like audio) into discrete events for logging and state management. ADK's approach involves:

  • Signal-based event division: Using cues like interruptions, explicit "complete" signals, or agent transfers to delineate events.
  • Efficient media storage: Storing larger media blobs in object storage (like Google Cloud Storage) and referencing them within the session events stored in a transactional database.
  • Transcription: Generating text transcriptions from audio/video streams, captured as separate, timestamped events.

This stateful session becomes the "briefing packet" for multi-agent collaboration. When a handoff occurs (e.g., from a triage agent to a specialist), the entire session context is transferred, allowing the next agent to pick up seamlessly without requiring the user to repeat information. This enables complex, multi-step workflows to feel like a single, intelligent conversation.

3. Event-driven callbacks for real-time customization

In a real-world bidi-streaming agentic application, a single run_live() call is insufficient. Developers need hooks into the agent's behaviours for customizations. ADK implements callbacks:

  • before_tool_callback: Inject custom logic before the tool gets executed.
  • after_tool_callback: Inject custom logic after the tool gets executed.

These callbacks enable dynamic control, such as logging tool statuses, real-time content moderation, or even injecting new information into the agents.

4. Streaming-native tools

Traditional tools follow a request-response model and can’t interact with I/O streams produced by models in real-time. ADK enables "streaming tools" – tools defined as asynchronous generators (AsyncGenerator). These tools can:

  • Accept standard inputs and yield multiple results over time.
  • Optionally accept the LiveRequestQueue to process user input streams directly.
  • Provide intermediate updates to the user/model while long-running tasks execute in the background
# Conceptual Example of a Streaming Tool
async def monitor_stock_price(symbol: str, alert_price: float) -> AsyncGenerator[str, None]:
    while True:
        current_price = await fetch_price(symbol)
        if current_price >= alert_price:
            yield f"Alert: {symbol} reached {current_price}"
            break
        yield f"Current price: {current_price}, waiting..."
        await asyncio.sleep(60)
Python

This allows agents to perform tasks like real-time data analysis, continuous monitoring, or processing large media streams, providing feedback in the background throughout the uninterrupted interaction with users.

The road ahead: challenges and future research

This architecture is a starting point for deep exploration and research. To further improve real-time, interactive AI, we are focusing on several key frontiers. Performance is paramount. We are committed to improving the startup and agent transfer times to make multi-agent interactions feel instantaneous and seamless. Besides, we aim to provide developers with even deeper control over the agent's lifecycle by introducing richer callback types, such as before-model-callback and after-model-callback.