The world of AI is moving at an exciting pace, and embeddings are at the core of many modern applications like semantic search and Retrieval Augmented Generation (RAG). Today, we're excited to discuss how you can leverage Google's new highly efficient, 308M parameter open embedding model, EmbeddingGemma. While its small size makes it perfect for on-device applications, this same efficiency unlocks powerful new possibilities for the cloud, especially when it comes to customization through fine-tuning. We'll show you how to use EmbeddingGemma with Google Cloud's Dataflow and vector databases like AlloyDB to build a scalable, real-time knowledge ingestion pipeline.



The power of embeddings and Dataflow

Embeddings are numerical vector representations of data that capture the underlying relationships between words and concepts. They are the cornerstone of applications that need to understand information on a deeper, conceptual level, from searching for documents that are semantically similar to a query to providing relevant context for Large Language Models (LLMs) in RAG systems.

To power these applications, you need a robust knowledge ingestion pipeline that can process unstructured data, convert it into embeddings, and load it into a specialized vector database. This is where Dataflow can help by encapsulating these steps into a single managed pipeline.

Using a small, highly efficient open model like EmbeddingGemma at the core of your pipeline makes the entire process self-contained, which can simplify management by eliminating the need for external network calls to other services for the embedding step. Because it's an open model, it can be hosted entirely within Dataflow. This provides the confidence to securely process large-scale, private datasets.

Beyond these operational benefits, EmbeddingGemma is also fine-tunable, allowing you to customize it for your specific data embedding needs; you can find a fine-tuning example here. Quality is just as important as scalability, and EmbeddingGemma excels here as well. It is the highest-ranking text-only multilingual embedding model under 500M parameters on the Massive Text Embedding Benchmark (MTEB) Multilingual leaderboard.

Dataflow is a fully managed, autoscaling platform for unified batch and streaming data processing. By including a model like EmbeddingGemma directly into a Dataflow pipeline, you gain several advantages:

Efficiency from data locality: Processing happens on the Dataflow workers, eliminating the need for remote procedure calls (RPC) to a separate inference service and avoiding problems from quotas and autoscaling multiple systems together. Your whole workflow can be bundled into a single set of workers, reducing your resource footprint.

Unified system: A single system handles autoscaling, observation, and monitoring, simplifying your operational overhead.

Scalability and simplicity: Dataflow automatically scales your pipeline up or down based on demand, and Apache Beam's transforms reduce boilerplate code.



Building the ingestion pipeline with Dataflow ML

A typical knowledge ingestion pipeline consists of four phases: reading from a data source, preprocessing the data, generating embeddings, and writing to a vector database.