The next chapter of the Gemini era for developers

十二月 11, 2024
Shrestha Basu Mallick Group Product Manager Gemini API
Kathy Korevec Director of Product Google Labs

We're giving developers the power to build the future of AI with cutting-edge models, intelligent tools to write code faster, and seamless integration across platforms and devices. Since last December when we launched Gemini 1.0, millions of developers have used Google AI Studio and Vertex AI to build with Gemini across 109 languages.

Today, we are announcing Gemini 2.0 Flash Experimental to enable even more immersive and interactive applications, as well as new coding agents that will enhance workflows by taking action on behalf of the developer.


Build with Gemini 2.0 Flash

Building on the success of Gemini 1.5 Flash, Flash 2.0 is twice as fast as 1.5 Pro while achieving stronger performance, includes new multimodal outputs, and comes with native tool use. We’re also introducing a Multimodal Live API for building dynamic applications with real-time audio and video streaming.

Starting today, developers can test and explore Gemini 2.0 Flash via the Gemini API in Google AI Studio and Vertex AI during its experimental phase, with general availability coming early next year.

With Gemini 2.0 Flash, developers have access to:

1. Better performance

Gemini 2.0 Flash is more powerful than 1.5 Pro while still delivering on the speed and efficiency that developers expect from Flash. It also features improved multimodal, text, code, video, spatial understanding and reasoning performance on key benchmarks. Improved spatial understanding enables more accurate bounding boxes generation on small objects in cluttered images, and better object identification and captioning. Learn more in the spatial understanding video or read the Gemini API docs.

Link to Youtube Video (visible only when JS is disabled)

2. New output modalities

Developers will be able to use Gemini 2.0 Flash to generate integrated responses that can include text, audio, and images — all through a single API call. These new output modalities are available to early testers, with wider rollout expected next year. SynthID invisible watermarks will be enabled in all image and audio outputs, helping decrease misinformation and misattribution concerns.

  • Multilingual native audio output: Gemini 2.0 Flash features native text-to-speech audio output that provides developers fine-grained control over not just what the model says, but how it says it, with a choice of 8 high-quality voices and a range of languages and accents. Hear native audio output in action or read more in the developer docs.

  • Native image output: Gemini 2.0 Flash now natively generates images and supports conversational, multi-turn editing, so you can build on previous outputs and refine them. It can output interleaved text and images, making it useful in multimodal content such as recipes. See more in the native image output video.

Link to Youtube Video (visible only when JS is disabled)

3. Native tool use

Gemini 2.0 has been trained to use tools–a foundational capability for building agentic experiences. It can natively call tools like Google Search and code execution in addition to custom third-party functions via function calling. Using Google Search natively as a tool leads to more factual and comprehensive answers and increases traffic to publishers. Multiple searches can be run in parallel leading to improved information retrieval by finding more relevant facts from multiple sources simultaneously and combining them for accuracy. Learn more in the native tool use video or start building from a notebook.

Link to Youtube Video (visible only when JS is disabled)

4. Multimodal Live API

Developers can now build real-time, multimodal applications with audio and video-streaming inputs from cameras or screens. Natural conversational patterns like interruptions and voice activity detection are supported. The API supports the integration of multiple tools together to accomplish complex use cases with a single API call. See more in the multimodal live streaming video, try the web console, or starter code (Python).

Link to Youtube Video (visible only when JS is disabled)

We’re thrilled to see startups making impressive progress with Gemini 2.0 Flash, prototyping new experiences like tldraw's visual playground, Viggle's virtual character creation and audio narration, Toonsutra's contextual multilingual translation, and Rooms' adding real-time audio.

To jumpstart building, we’ve released three starter app experiences in Google AI Studio along with open source code for spatial understanding, video analysis and Google Maps exploration so you can begin building with Gemini 2.0 Flash.


Enabling the evolution of AI code assistance

As AI code assistance rapidly evolves from simple code searches to AI-powered assistants embedded in developer workflows, we want to share the latest advancement that will use Gemini 2.0: coding agents that can execute tasks on your behalf.

In our latest research, we've been able to use 2.0 Flash equipped with code execution tools to achieve 51.8% on SWE-bench Verified, which tests agent performance on real-world software engineering tasks. The cutting edge inference speed of 2.0 Flash allowed the agent to sample hundreds of potential solutions, selecting the best based on existing unit tests and Gemini's own judgment. We're in the process of turning this research into new developer products.


Meet Jules, your AI-powered code agent

Imagine your team has just finished a bug bash, and now you’re staring down a long list of bugs. Starting today, you can offload Python and Javascript coding tasks to Jules, an experimental AI-powered code agent that will use Gemini 2.0. Working asynchronously and integrated with your GitHub workflow, Jules handles bug fixes and other time-consuming tasks while you focus on what you actually want to build. Jules creates comprehensive, multi-step plans to address issues, efficiently modifies multiple files, and even prepares pull requests to land fixes directly back into GitHub.

It’s early, but from our internal experience using Jules, it’s giving developers:

  • More productivity. Assign issues and coding tasks to Jules for asynchronous coding efficiency.

  • Progress tracking. Stay informed and prioritize tasks that require your attention with real-time updates.

  • Full developer control. Review the plans Jules creates along the way, and provide feedback or request adjustments as you see fit. Easily review and, if appropriate, merge the code Jules writes into your project.

We’re making Jules available for a select group of trusted testers today, and we’ll make it available for other interested developers in early 2025. Sign up to get updates about Jules on labs.google.com/jules.


Colab's data science agent will create notebooks for you

At I/O this year, we launched an experimental Data Science Agent on labs.google/code that allows anyone to upload a dataset and get insights within minutes, all grounded in a working Colab notebook. We were thrilled to receive such positive feedback from the developer community and see the impact. For example, with the help of Data Science Agent, a scientist at Lawrence Berkeley National Laboratory working on a global tropical wetland methane emissions project has estimated their analysis and processing time was reduced from one week to five minutes.

Colab has started to integrate these same agentic capabilities, using Gemini 2.0. Simply describe your analysis goals in plain language, and watch your notebook take shape automatically, helping accelerate your ability to conduct research and data analysis. Developers can get early access to this new feature by joining the trusted tester program before it rolls out more widely to Colab users in the first half of 2025.

Developers are building the future

Our Gemini 2.0 models can empower you to build more capable AI apps faster and easier, so you can focus on great experiences for your users. We'll be bringing Gemini 2.0 to our platforms like Android Studio, Chrome DevTools and Firebase in the coming months. Developers can sign up to use Gemini 2.0 Flash in Gemini Code Assist, for enhanced coding assistance capabilities in popular IDEs such as Visual Studio Code, IntelliJ, PyCharm and more. Visit ai.google.dev to get started and follow Google AI for Developers for future updates.