Many users interact with generative AI daily without realizing the crucial role of underlying APIs in making it accessible. APIs unlock the power of generative AI by making models available to both automated agents and human users. Complex business processes leveraged internally and externally are built by connecting multiple APIs in an agentic workflow.
The Google Kubernetes Engine Inference Gateway (GKE IG) is an extension to the GKE Gateway that provides optimized routing and load balancing for serving generative Artificial Intelligence (AI) workloads. It simplifies the deployment, management, and observability of AI inference workloads. The GKE IG offers:
OpenAI API
specifications within your GKE cluster.Criticality
. The GKE IG lets you specify the serving Criticality
of AI models to prioritize latency-sensitive requests over latency-tolerant batch inference jobs.Inference observability.GKE IG provides observability metrics for inference requests, such as request rate, latency, errors, and saturation.
Enterprise customers using the GKE Inference Gateway would like to secure and optimize their agentic/AI workloads. They want to publish and monetize their Agentic APIs and at the same time have access to high quality API governance features offered by Apigee as part of their Agentic API commercialization strategy.
GKE Inference Gateway recently added the GCPTrafficExtension resource, enabling the GKE Gateway to make a “sideways” call to a PDP (policy decision point) through the service extension (or ext-proc)
mechanism. The Apigee Operator for Kubernetes uses the service extension mechanism to enforce Apigee policies on API traffic flowing through the GKE Inference Gateway. The integration between Apigee and GKE Inference Gateway offers the benefits of Apigee policies and Apigee governance features to GKE Inference Gateway users.
As shown in the following diagram, the GKE IG and Apigee APIM Operator work together as follows:
Apigee provides a comprehensive API management layer for traditional transactional APIs and Large Language Models (LLMs) across Google Cloud, other public clouds, and on-premise infrastructure. This platform offers a powerful policy engine, full API lifecycle management, and advanced AI/ML-powered analytics. Apigee is recognized as a Leader for API management in the Gartner Magic Quadrant, serving large enterprises with complex API needs.
Through this new integration with GKE Inference Gateway, GKE users can now leverage Apigee’s full suite of features to manage, govern and monetize their AI workload through APIs. This includes the ability for API producers to package APIs into API Products, which are then made available to developers via a self-service process through developer portals. Users also gain access to Apigee's value-added services, such as advanced API security and detailed API analytics.
Apigee policies available to the users of this integration are:
The APIM Operator used in this integration also supports what is called “admin template rules,” enabling organization administrators to enforce certain policies in their organization. For example, the organization admin can require that certain policies be applied to all APIs, or specify a list of policies that can't be used with the organization's APIs.
The integration in future will also offer the following policies:
To summarize, GKE Inference Gateway users now have access to the best in class APIM management and security for their APIs through Apigee. With Apigee's full-featured API management platform at your disposal, you can focus on your core mission: running your inference engine on GKE to take advantage of the best-in-class AI infrastructure available in public clouds.