- Google Developers Blog

JULY 24, 2026 / AI

Run Ray on TPU, Part 2: Ray AI libraries

This second installment explores how Ray’s higher-level libraries—Serve, Data, and Train—abstract the complexities of running AI workloads on Google's TPU slices. Ray Serve uses a simple topology configuration to correctly gang-schedule large multi-host models, while Ray Data eliminates data-loading bottlenecks by feeding accelerators directly with native JAX batches. Finally, JaxTrainer streamlines distributed training across TPUs by automatically handling cross-slice coordination, checkpointing, and fault tolerance.

JULY 21, 2026

Scaling Agentic RL: High-Throughput Agentic Training with Tunix

Tunix is Google’s new JAX-native post-training library designed to eliminate TPU idling bottlenecks when training multi-turn, tool-using LLM reasoning agents. It maximizes hardware throughput by combining highly concurrent, asynchronous rollouts with a decoupled producer-consumer pipeline, ensuring the trainer is constantly fed even while agents wait on network I/O or environment steps. Additionally, Tunix provides plug-and-play abstractions and continuous macro-level profiling, allowing developers to easily integrate custom open-source environments and optimize complex distributed workflows without massive code rewrites.

JULY 20, 2026 / AI

Run Ray on TPU, Part 1: The foundations

Ray 2.55 introduces official, first-class support for Google Cloud TPUs, enabling developers to run distributed Python workloads on Google's accelerators using the familiar Ray task-and-actor APIs. To handle the strict networking requirement of keeping multi-host TPU "slices" together over their Inter-Chip Interconnect (ICI), the KubeRay Operator on GKE automatically provisions and labels the underlying hardware layout. Ray Core utilizes these labels via its slice_placement_group() primitive to atomically reserve complete slices, allowing developers to deploy jobs through KubeRay, Ray Train, or Ray Serve simply by declaring a hardware topology (like "4x4") without writing custom placement code.

JULY 16, 2026 / AI

Building scalable AI agents with modular prompt transpilation

To resolve the scaling bottlenecks and runtime errors caused by monolithic system prompts, engineering teams should treat prompts as build artifacts by modularizing instructions into reusable templates. By running these modular "skill files" through a transpiler, developers can enforce static validation, catch missing dependencies at build time, and integrate prompt generation directly into their CI/CD pipelines. This deterministic approach prevents code drift and ultimately establishes a safe framework where agents can propose updates to their own logic via standard pull requests.

Agent Development Kit: Making it easy to build multi-agent applications

JULY 16, 2026 / AI

Evolving Spec-Driven Development: Conductor Now Supports Antigravity

Conductor has evolved from a Gemini CLI extension into a portable plugin, bringing conversational Spec-Driven Development (SDD) to ecosystems like Antigravity CLI and Claude. Rather than relying on strict command sequences, developers can now chat naturally with their AI assistant while it dynamically manages persistent markdown artifacts (like spec.md and plan.md) in the background. This update eliminates workflow friction while ensuring your repository remains a version-controlled, single source of truth for your project's architecture and state across different AI tools.

ADK + Gemini CLI: Supercharge Your Agent Building Vibe

JULY 16, 2026 / AI

Expanding Choice in Gemini Enterprise Agent Platform: Introducing Grounding with Parallel Web Search

Google Cloud has partnered with Parallel Web Systems to natively integrate Parallel's search infrastructure as a web grounding provider on the Gemini Enterprise Agent Platform. This integration enables developers to anchor their AI agents in verifiable, real-time web results, significantly improving factual accuracy for complex enterprise workflows. Additionally, the partnership offers expanded architectural flexibility, allowing users to programmatically extract, permanently cache, and process web data alongside other large language models.

JULY 14, 2026

Systems Engineering Playbook: Optimizing Qwen 3.5-397B MoE on Ironwood (TPU7x)

To serve the 397B-parameter Qwen 3.5 Mixture-of-Experts (MoE) model on Ironwood TPUs, engineers developed a modular JAX/Pallas optimization stack that achieved up to a 4.7x inference speedup for prefill-heavy workloads. The team bypassed severe hardware sharding constraints by deploying a hybrid Data Parallelism and Expert Parallelism (DP+EP) topology, paired with custom low-level communication fusions like a hierarchical reduce-scatter to optimize cross-device token routing. Finally, by executing hardware-aware custom kernels—such as Batched Ragged Page Attention and a fully-fused Gated DeltaNet (GDN) block—they successfully saturated HBM bandwidth and TensorCore MXUs to push system throughput near its theoretical roofline limits.

JULY 9, 2026 / Web

LiteRT.js, Google's high performance Web AI Inference

We're excited to introduce LiteRT.js, the newest member of the LiteRT family! LiteRT.js is our powerful solution for running machine learning models directly in the browser, extending Google's cross-platform edge AI runtime to the web. Built for JavaScript developers, LiteRT.js delivers state-of-the-art ML model inference performance on WebGPU and upcoming WebNN, with a fallback to WebAssembly for CPU. This post provides a quick tour of LiteRT.js and gives web developers everything they need to get started.

JULY 8, 2026 / Mobile

Bridging the Domain Gap: AI Race Coach built with Antigravity and Gemini

On May 23, 2026, fresh off the stage at Google I/O, our Google Developer Experts (GDEs) converged on...

JULY 6, 2026 / AI

We terminated a TPU mid-training and it recovered in seconds: Introduction to elastic training with MaxText

Distributed AI training is notoriously fragile because losing a single machine typically crashes the entire multi-node job, forcing a time-consuming, full-workload infrastructure restart. To address this, Google’s JAX ecosystem utilizes elastic training via Pathways, which converts a hardware failure into a catchable Python exception so the running process can survive. When an unplanned failure occurs, the system automatically replaces only the broken worker, restores the last viable checkpoint from Cloud Storage, and resumes training in place—minimizing total downtime to under two minutes without ever restarting the main controller process.

Search

Content Type

Product

Technology

Run Ray on TPU, Part 2: Ray AI libraries

Scaling Agentic RL: High-Throughput Agentic Training with Tunix

Run Ray on TPU, Part 1: The foundations

Building scalable AI agents with modular prompt transpilation

Evolving Spec-Driven Development: Conductor Now Supports Antigravity

Expanding Choice in Gemini Enterprise Agent Platform: Introducing Grounding with Parallel Web Search

Systems Engineering Playbook: Optimizing Qwen 3.5-397B MoE on Ironwood (TPU7x)

LiteRT.js, Google's high performance Web AI Inference

Bridging the Domain Gap: AI Race Coach built with Antigravity and Gemini

We terminated a TPU mid-training and it recovered in seconds: Introduction to elastic training with MaxText

Content Type

Product

Technology