Gemini Embedding: Powering RAG and context engineering

JULY 30, 2025

Vishal Dharmadhikari Developer Experience

Janie Zhang

Since announcing the general availability of our Gemini Embedding text model, we've seen developers rapidly adopt it to build advanced AI applications. Beyond traditional use cases like classification, semantic search, and retrieval-augmented generation (RAG), many are now using a technique called context engineering to provide AI agents with complete operational context. Embeddings are crucial here, as they efficiently identify and integrate vital information—like documents, conversation history, and tool definitions—directly into a model's working memory.

The following examples showcase how organizations across industries are already leveraging the Gemini Embedding model to power sophisticated systems.

Gemini Embedding in action

Unleashing capabilities in global content intelligence

Box, an intelligent content management platform, is integrating Gemini Embedding to enable a critical use case: answering questions and extracting insights from complex documents. During their evaluations, gemini-embedding-001 found the correct answer over 81% of the time, exhibiting a 3.6% increase in recall compared to other embedding models. Beyond this performance boost, our model's built-in multilingual support is a promising advancement for their global users, enabling Box AI to unlock insights from content across different languages and regions.

Improving accuracy in financial data analysis

Financial technology company re:cap uses embeddings to classify high volumes of B2B bank transactions. They measured the impact of gemini-embedding-001 by benchmarking against previous Google models (text-embedding-004 and text-embedding-005) on a dataset of 21,500 transactions, finding an increase in F1 score by 1.9% and 1.45% respectively. The F1 score, which balances a model's precision and recall, is crucial for classification tasks. This demonstrates how a capable model like Gemini Embedding directly drives significant performance gains, helping re:cap deliver sharper liquidity insights to its customers.

Achieving semantic precision in legal discovery

Everlaw, a platform providing verifiable RAG to help legal professionals analyze large volumes of discovery documents, requires precise semantic matching across millions of specialized texts. Through internal benchmarks, Everlaw found gemini-embedding-001 to be the best, achieving 87% accuracy in surfacing relevant answers from 1.4 million documents filled with industry-specific and complex legal terms, surpassing Voyage (84%) and OpenAI (73%) models. Furthermore, Gemini Embedding's Matryoshka property enables Everlaw to use compact representations, focusing essential information in fewer dimensions. This leads to minimal performance loss, reduced storage costs, and more efficient retrieval and search.

Leveling up codebase search for developers

Roo Code, an open-source AI coding assistant, uses the Gemini Embedding model to power its codebase indexing and semantic search. Developers using Roo Code need a search that helps understand intent, not just syntax, as the assistant interacts across multiple files like a human teammate. By pairing gemini-embedding-001 with Tree-sitter for logical code splitting, Roo Code delivers highly relevant results, even for imprecise queries. After initial testing, they found Gemini Embedding significantly improved their LLM-driven code search, making it more flexible, accurate, and aligned with developer workflows.

Delivering personalized mental wellness support

Mindlid's AI wellness companion leverages gemini-embedding-001 to understand conversational history, enabling context-aware and meaningful insights that adapt in real time to users. They documented impressive performance: consistent sub-second latency (median: 420ms) and a measurable 82% top-3 recall rate, a 4% recall lift over OpenAI's text-embedding-3-small. This shows how Gemini Embedding improves the relevance and speed of their AI's support by delivering the most pertinent information.

Enhancing context and efficiency of AI assistants

Interaction Co. is building Poke, an AI email assistant that automates tasks and extracts information from Gmail. Poke uses Gemini Embedding for two key functions: retrieving user "memories" and identifying relevant emails for enhanced context. By integrating gemini-embedding-001, Poke's language model retrieves data with greater speed and precision. They've reported a significant 90.4% reduction in the average time to embed 100 emails compared to Voyage-2, completing the task in just 21.45 seconds.

The foundation for future agents

As AI systems become more autonomous, their effectiveness will be determined by the quality of the context we provide them. High-performance embedding models like gemini-embedding-001 are a fundamental component for building the next generation of agents that can reason, retrieve information, and act on our behalf.

To get started with embeddings, visit the Gemini API documentation.

_{Performance metrics were provided by developers and not independently confirmed by Google.}

posted in: