Many users interact with generative AI daily without realizing the crucial role of underlying APIs in making these powerful capabilities accessible. APIs unlock the power of generative AI by making models available to both automated agents and human users. Complex business processes leveraged internally and externally are built by connecting multiple APIs in agentic workflows.
The Google Kubernetes Engine (GKE) Inference Gateway is an extension to the GKE Gateway that provides optimized routing and load balancing for serving generative Artificial Intelligence (AI) workloads. It simplifies the deployment, management, and observability of AI inference workloads. The GKE Inference Gateway offers:
Criticality
: The GKE Inference Gateway lets you specify the serving Criticality
of AI models to prioritize latency-sensitive requests over latency-tolerant batch inference jobs.Most enterprise customers using the GKE Inference Gateway would like to secure and optimize their agentic/AI workloads. They want to publish and monetize their Agentic APIs, while accessing the high quality API governance features offered by Apigee as part of their Agentic API commercialization strategy.
GKE Inference Gateway solves this challenge through the introduction of the GCPTrafficExtension resource, enabling the GKE Gateway to make a “sideways” call to a policy decision point (PDP) through the service extension (or ext-proc) mechanism.
The Apigee Operator for Kubernetes leverages this service extension mechanism to enforce Apigee policies on API traffic flowing through the GKE Inference Gateway. This seamless integration provides GKE Inference Gateway users with the benefits of Apigee's API governance.
The GKE Inference Gateway and Apigee Apigee Operator for Kubernetes work together through the following steps:
Apigee provides a comprehensive API management layer for traditional transactional APIs and Large Language Models (LLMs) across Google Cloud, other public clouds, and on-premise infrastructure. This platform offers a powerful policy engine, full API lifecycle management, and advanced AI/ML-powered analytics. Apigee is recognized as a Leader for API management in the Gartner Magic Quadrant, serving large enterprises with complex API needs.
Through this new integration with GKE Inference Gateway, GKE users can leverage Apigee’s full suite of features to manage, govern, and monetize their AI workload through APIs. This includes the ability for API producers to package APIs into API Products available to developers through self-service developer portals. Users also gain access to Apigee's value-added services, such as API security and detailed API analytics.
With the integration, GKE users can access Apigee policies governing:
The Apigee Operator for Kubernetes used in this integration also supports admin template rules, letting organization administrators enforce policy rules across their organization. For example, an organization admin can require that certain policies be applied to all APIs, or specify a list of policies that can't be used with the organization's APIs.
Future plans include support for Apigee AI policies governing:
By leveraging Apigee's best-in-class API management and security capabilities through the GKE Inference Gateway, enterprises can now unify their AI serving and API governance layers. With Apigee's full-featured API management platform at your disposal, you can focus on your core mission: running your inference engine on GKE to take advantage of the best-in-class AI infrastructure available in public clouds.