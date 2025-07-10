Building sophisticated AI applications with Large Language Models (LLMs), especially those handling multimodal input and requiring real-time responsiveness, often feels like assembling a complex puzzle: you're stitching together diverse data processing steps, asynchronous API calls, and custom logic. As complexity grows, this can lead to brittle, hard-to-maintain code. Today, we're introducing GenAI Processors, a new open-source Python library from Google DeepMind designed to bring structure and simplicity to these challenges. GenAI Processors provides an abstraction layer, defining a consistent Processor interface for everything from input handling and pre-processing to model calls and output processing. At its core, GenAI Processors treat all input and output as asynchronous streams of ProcessorParts (i.e. two-way aka bidirectional streaming). Think of it as standardized data parts (e.g., a chunk of audio, a text transcription, an image frame) flowing through your pipeline along with associated metadata. This stream-based API allows for seamless chaining and composition of different operations, from low-level data manipulation to high-level model calls.

The GenAI Processors library is designed to optimize the concurrent execution of a Processor. Any part in this example of execution flow can be generated concurrently when all its ancestors in the graph are computed, e.g. `c'12` can be generated concurrently to `a’1`. The flow maintains the ordering of the output stream with respect to the input stream and will be executed to minimize Time To First Token (prefer `a12` to `d12` whenever possible). This concurrency optimization is done under the hood: applying a Processor to a stream of input will automatically trigger this concurrent execution whenever possible.

For example, you can easily build a "Live Agent" capable of processing audio and video streams in real-time using the Gemini Live API with just a few lines of code. In the following example, notice how input sources and processing steps are combined using the + operator, creating a clear data flow (full code on GitHub):

from genai_processors.core import audio_io, live_model, video # Input processor: combines camera streams and audio streams input_processor = video.VideoIn() + audio_io.PyAudioIn(...) # Output processor: plays the audio parts. Handles interruptions and pauses # audio output when the user is speaking. play_output = audio_io.PyAudioOut(...) # Gemini Live API processor live_processor = live_model.LiveProcessor(...) # Compose the agent: mic+camera -> Gemini Live API -> play audio live_processor = live_model.LiveProcessor(...) live_agent = input_processor + live_processor + play_output async for part in live_agent(streams.endless_stream()): # Process the output parts (e.g., print transcription, model output, metadata) print(part) Python Copied

You can also build your own Live agent, leveraging a standard text-based LLM, using the bidirectional streaming capability of the GenAI Processor library and the Google Speech API (full code on GitHub):

from genai_processors.core import genai_model, realtime, speech_to_text, text_to_speech # Input processor: gets input from audio in (mic) and transcribes into text input_processor = audio_io.PyAudioIn(...) + speech_to_text.SpeechToText(... ) play_output = audio_io.PyAudioOut(...) # Main model that will be used to generate the response. genai_processor = genai_model.GenaiModel(...), # TTS processor that will be used to convert the text response to audio. Note # the rate limit audio processor that will be used to stream back small audio # chunks to the client at the same rate as how they are played back. tts = text_to_speech.TextToSpeech(...) + rate_limit_audio.RateLimitAudio(...) # Creates an agent as: # mic -> speech to text -> text conversation -> text to speech -> play audio live_agent = ( input_processor + realtime.LiveModelProcessor(turn_processor=genai_processor + tts) + play_output ) async for part in live_agent(streams.endless_stream()): … Python Copied

We anticipate a growing need for proactive LLM applications where responsiveness is critical. Even for non-streaming use cases, processing data as soon as it is available can significantly reduce latency and time to first token (TTFT), which is essential for building a good user experience. While many LLM APIs prioritize synchronous, simplified interfaces, GenAI Processors – by leveraging native Python features – offer a way for writing responsive applications without making code more complex. Trip planner and Research Agent examples demonstrate how turn-based agents can use the concurrency feature of GenAI Processors to increase responsiveness.

Core design principles At the heart of GenAI Processors is the concept of a Processor : a fundamental building block that encapsulates a specific unit of work. It takes a stream of inputs, performs an operation, and outputs a stream of results. This simple, consistent API is a cornerstone of the library's power and flexibility. Here's a look at the core design decisions and their benefits for developers: Modular design: Break down complex workflows into self-contained Processor units. This ensures code reusability, testability, and significantly simplifies maintaining intricate pipelines. Asynchronous & concurrent: Fully leverages Python's asyncio for efficient handling of I/O-bound and compute-bound tasks. This enables responsive applications without manual threading or complex concurrency management. Integrated with Gemini API: Dedicated processors like GenaiModel (for turn-based interaction) and LiveProcessor (for real-time streaming) simplify interaction with the Gemini API, including the complexities of the Live API. This reduces boilerplate and accelerates integration. Extensible: Easily create custom processors by inheriting from base classes or using decorators. Integrate your own data processing logic, external APIs, or specialized operations seamlessly into your pipelines. Unified multimodal handling: The ProcessorPart wrapper provides a consistent interface for handling diverse data types (text, images, audio, JSON, etc.) within the pipeline. Stream manipulation utilities: Built-in utilities for splitting, concatenating, and merging asynchronous streams. This provides fine-grained control over data flow within complex pipelines.

Getting started Getting started with GenAI Processors is straightforward. You can install it with pip:

pip install genai-processors Python Copied