Posts by Tianshu Bao

2 results

Clear filters
  • MAY 28, 2026 / AI

    How the community trained Gemma to "Think" with Tunix and TPUs

    The Google Tunix Hackathon on Kaggle challenged developers to transform small, non-reasoning base models into general reasoning engines using Kaggle TPUs and a limited compute budget. The winning teams achieved this by implementing multi-stage post-training pipelines that combined Supervised Fine-Tuning (SFT) with advanced alignment techniques like GRPO and SimPO. Ultimately, the competition democratized AI development by proving that highly capable, structured reasoning models can be successfully trained by the community using accessible, open-source resources.

    Building-1-banner
  • SEPT. 30, 2025 / AI

    Introducing Tunix: A JAX-Native Library for LLM Post-Training

    Tunix is a new JAX-native, open-source library for LLM post-training. It offers comprehensive tools for aligning models at scale, including SFT, preference tuning (DPO), advanced RL methods (PPO, GRPO, GSPO), and knowledge distillation. Designed for TPUs and seamless JAX integration, Tunix emphasizes developer control and shows a 12% relative improvement in pass@1 accuracy on GSM8K.

    Tunix logo