Apigee Operator for Kubernetes and GKE Inference Gateway integration for Auth and AI/LLM policies

SEPT. 26, 2025
Sanjay Pujare Software Engineer
Jennifer Bennett Tech Writing Mgr.

No AI/Agents without APIs!

Many users interact with generative AI daily without realizing the crucial role of underlying APIs in making these powerful capabilities accessible. APIs unlock the power of generative AI by making models available to both automated agents and human users. Complex business processes leveraged internally and externally are built by connecting multiple APIs in agentic workflows.

GKE Inference Gateway

The Google Kubernetes Engine (GKE) Inference Gateway is an extension to the GKE Gateway that provides optimized routing and load balancing for serving generative Artificial Intelligence (AI) workloads. It simplifies the deployment, management, and observability of AI inference workloads. The GKE Inference Gateway offers:

  • Optimized load balancing for inference: GKE Inference Gateway distributes requests to optimize AI model serving using metrics from model servers.
  • Dynamic LoRA fine-tuned model serving: GKE Inference Gateway supports serving dynamic LoRA (Low-Rank Adaptation) fine-tuned models on a common accelerator, reducing the number of GPUs and TPUs required to serve models through multiplexing.
  • Optimized autoscaling for inference: The GKE Horizontal Pod Autoscaler (HPA) uses model server metrics to autoscale.
  • Model-aware routing: The Gateway routes inference requests based on model names defined in OpenAI API specifications within your GKE cluster.
  • Model-specific serving Criticality: The GKE Inference Gateway lets you specify the serving Criticality of AI models to prioritize latency-sensitive requests over latency-tolerant batch inference jobs.
  • Integrated AI safety: GKE Inference Gateway integrates with Google Cloud Model Armor to apply AI safety checks to model prompts and responses.
  • Inference observability: GKE Inference Gateway provides observability metrics for inference requests, such as request rate, latency, errors, and saturation.

Leveraging the GCPTrafficExtension

The challenge

Most enterprise customers using the GKE Inference Gateway would like to secure and optimize their agentic/AI workloads. They want to publish and monetize their Agentic APIs, while accessing the high quality API governance features offered by Apigee as part of their Agentic API commercialization strategy.

The solution

GKE Inference Gateway solves this challenge through the introduction of the GCPTrafficExtension resource, enabling the GKE Gateway to make a “sideways” call to a policy decision point (PDP) through the service extension (or ext-proc) mechanism.

The Apigee Operator for Kubernetes leverages this service extension mechanism to enforce Apigee policies on API traffic flowing through the GKE Inference Gateway. This seamless integration provides GKE Inference Gateway users with the benefits of Apigee's API governance.

The GKE Inference Gateway and Apigee Apigee Operator for Kubernetes work together through the following steps:

  • Provision Apigee: The GKE Inference Gateway administrator provisions an Apigee instance on Google Cloud.
  • Install the Apigee Operator for Kubernetes: The administrator installs the Apigee Operator for Kubernetes within their GKE cluster and connects it to the newly provisioned Apigee instance.
  • Create an ApigeeBackendService: An ApigeeBackendService resource is created. This resource acts as a proxy for the Apigee dataplane.
  • Apply the Traffic Extension: The ApigeeBackendService is then referenced as the backendRef within a GCPTrafficExtension.
  • Enforce Policies: The GCPTrafficExtension is applied to the GKE Inference Gateway, allowing Apigee to enforce policies on the API traffic flowing through the gateway.
Apigee + GKE IG Diagram (2)

Apigee Operator for Kubernetes: API management for LLMs

Apigee provides a comprehensive API management layer for traditional transactional APIs and Large Language Models (LLMs) across Google Cloud, other public clouds, and on-premise infrastructure. This platform offers a powerful policy engine, full API lifecycle management, and advanced AI/ML-powered analytics. Apigee is recognized as a Leader for API management in the Gartner Magic Quadrant, serving large enterprises with complex API needs.

Through this new integration with GKE Inference Gateway, GKE users can leverage Apigee’s full suite of features to manage, govern, and monetize their AI workload through APIs. This includes the ability for API producers to package APIs into API Products available to developers through self-service developer portals. Users also gain access to Apigee's value-added services, such as API security and detailed API analytics.

With the integration, GKE users can access Apigee policies governing:

  • API keys
  • Quotas
  • Rate limiting
  • Google access tokens
  • Key value stores
  • OpenAPI spec validation
  • Traffic spikes
  • Custom javascript
  • Response caching
  • External service callouts

The Apigee Operator for Kubernetes used in this integration also supports admin template rules, letting organization administrators enforce policy rules across their organization. For example, an organization admin can require that certain policies be applied to all APIs, or specify a list of policies that can't be used with the organization's APIs.

Future plans include support for Apigee AI policies governing:

  • Model Armor security
  • Semantic caching
  • Token counting and enforcement
  • Prompt-based model routing

No AI without APIs - Reprise

By leveraging Apigee's best-in-class API management and security capabilities through the GKE Inference Gateway, enterprises can now unify their AI serving and API governance layers. With Apigee's full-featured API management platform at your disposal, you can focus on your core mission: running your inference engine on GKE to take advantage of the best-in-class AI infrastructure available in public clouds.