Apigee Operator for Kubernetes and GKE Inference Gateway integration for Auth and AI/LLM policies

2025年9月24日

No AI/Agents without APIs!

Many users interact with generative AI daily without realizing the crucial role of underlying APIs in making it accessible. APIs unlock the power of generative AI by making models available to both automated agents and human users. Complex business processes leveraged internally and externally are built by connecting multiple APIs in an agentic workflow.

GKE Inference Gateway

The Google Kubernetes Engine Inference Gateway (GKE IG) is an extension to the GKE Gateway that provides optimized routing and load balancing for serving generative Artificial Intelligence (AI) workloads. It simplifies the deployment, management, and observability of AI inference workloads. The GKE IG offers:

  • Optimized load balancing for inference. GKE IG distributes requests to optimize AI model serving using metrics from model servers.
  • Dynamic LoRA fine-tuned model serving. GKE IG supports serving dynamic LoRA (Low-Rank Adaptation) fine-tuned models on a common accelerator, reducing the number of GPUs and TPUs required to serve models through multiplexing.
  • Optimized autoscaling for inference. The GKE Horizontal Pod Autoscaler (HPA) uses model server metrics to autoscale.
  • Model-aware routing. The Gateway routes inference requests based on model names defined in OpenAI API specifications within your GKE cluster.
  • Model-specific serving Criticality. The GKE IG lets you specify the serving Criticality of AI models to prioritize latency-sensitive requests over latency-tolerant batch inference jobs.
  • Integrated AI safety. GKE IG integrates with Google Cloud Model Armor to apply AI safety checks to model prompts and responses.

Inference observability.GKE IG provides observability metrics for inference requests, such as request rate, latency, errors, and saturation.

Leveraging the GCPTrafficExtension

Problem

Enterprise customers using the GKE Inference Gateway would like to secure and optimize their agentic/AI workloads. They want to publish and monetize their Agentic APIs and at the same time have access to high quality API governance features offered by Apigee as part of their Agentic API commercialization strategy.

Solution

GKE Inference Gateway recently added the GCPTrafficExtension resource, enabling the GKE Gateway to make a “sideways” call to a PDP (policy decision point) through the service extension (or ext-proc) mechanism. The Apigee Operator for Kubernetes uses the service extension mechanism to enforce Apigee policies on API traffic flowing through the GKE Inference Gateway. The integration between Apigee and GKE Inference Gateway offers the benefits of Apigee policies and Apigee governance features to GKE Inference Gateway users.

As shown in the following diagram, the GKE IG and Apigee APIM Operator work together as follows:

  • Provision Apigee: The GKE IG admin provisions an Apigee instance on Google Cloud.
  • Install the APIM Operator: The admin installs the APIM operator within their GKE cluster and connects it to the newly provisioned Apigee instance.
  • Create an ApigeeBackendService: An ApigeeBackendService resource is created. This resource acts as a proxy for the Apigee dataplane.
  • Apply the Traffic Extension: The ApigeeBackendService is then referenced as the backendRef within a GCPTrafficExtension.
  • Enforce Policies: The GCPTrafficExtension is applied to the GKE Inference Gateway, which allows Apigee to enforce policies on the API traffic flowing through the gateway.
google-ai-edge-gallery

Apigee Operator for Kubernetes for API management

Apigee provides a comprehensive API management layer for traditional transactional APIs and Large Language Models (LLMs) across Google Cloud, other public clouds, and on-premise infrastructure. This platform offers a powerful policy engine, full API lifecycle management, and advanced AI/ML-powered analytics. Apigee is recognized as a Leader for API management in the Gartner Magic Quadrant, serving large enterprises with complex API needs.

Through this new integration with GKE Inference Gateway, GKE users can now leverage Apigee’s full suite of features to manage, govern and monetize their AI workload through APIs. This includes the ability for API producers to package APIs into API Products, which are then made available to developers via a self-service process through developer portals. Users also gain access to Apigee's value-added services, such as advanced API security and detailed API analytics.

Apigee policies available to the users of this integration are:

  • API Key
  • Quota
  • Rate limiting
  • Google tokens: in case you are accessing Google services
  • Key value store policy
  • Open API Spec validation
  • Spike Arrest policy
  • Javascript policy
  • Response Cache
  • External service callout

The APIM Operator used in this integration also supports what is called “admin template rules,” enabling organization administrators to enforce certain policies in their organization. For example, the organization admin can require that certain policies be applied to all APIs, or specify a list of policies that can't be used with the organization's APIs.

The integration in future will also offer the following policies:

  • Model armor
  • semantic caching
  • token counting and enforcement
  • model routing based on prompts

No AI without APIs - Reprise

To summarize, GKE Inference Gateway users now have access to the best in class APIM management and security for their APIs through Apigee. With Apigee's full-featured API management platform at your disposal, you can focus on your core mission: running your inference engine on GKE to take advantage of the best-in-class AI infrastructure available in public clouds.