Veo 3 Fast and new image-to-video capabilities

31 DE JULHO DE 2025
Alisa Fortin Product Manager
Seth Odoom Product Manager

Building on the recent launch of Veo 3, we’re now introducing Veo 3 Fast – a model optimized for speed and price, allowing developers to iterate faster while producing high quality outputs efficiently.

We’re also bringing image-to-video capabilities to both Veo 3 and Veo 3 Fast, making it possible to transform still images into clips that maintain consistency. Both models, with image-to-video capabilities, are available in paid preview via the Gemini API.

Link to Youtube Video (visible only when JS is disabled)

Prompt in footnote (1)

A faster, more efficient model

Veo 3 Fast is a quicker and more cost effective version of Veo 3, allowing developers to create videos with sound while maintaining high quality and optimizing for speed and business use cases. Veo 3 Fast offers both text-to-video and image-to-video modalities.

It is the ideal choice for:

  • Programmatic advertising: Powering backend services that automatically generate ad creatives.

  • Rapid prototyping: Enabling quick A/B testing of different creative concepts.

  • Content creation at scale: Building apps that need to quickly produce social media content.


Veo 3 Fast will be: $0.40 / second with audio.


New image-to-video capabilities

Developers can now use Veo 3 and Veo 3 Fast to generate high-quality video content (with sound) from input images. This new capability allows for the creation of dynamic video sequences that can maintain the consistency in the first image. Just provide an image alongside your text prompt, and you can guide the model to achieve your desired motion, narrative, and audio. Outputs generated with image-to-video will be priced the same as text-to-video outputs.

Link to Youtube Video (visible only when JS is disabled)

Prompt in footnote (2)

Image-to-video is designed to give you more creative control and flexibility:

  • High-quality video generation: Create fluid, cinematic-quality videos from a single image, maintaining stylistic consistency and detail – all with audio.

  • Precise prompting: Combine image inputs with descriptive text prompts to direct the action, style, and evolution of your video content.

  • Seamless API integration: Access this powerful new feature through the same intuitive Gemini API, making it easy to integrate into your existing workflows and applications.

Remains $0.75 / second with audio.

At OpusClip, we use Veo 3 to enhance our customers' video editing experience and generate B-roll videos through its image-to-video capability. Veo 3 takes a static image as the first frame and brings it to life by generating smooth, cinematic motion. This helps creators get engaging video content with minimal effort.


Start building the future of video today

We're incredibly excited to see what developers will create with Veo 3, Veo 3 Fast, and image-to-video capabilities via the Gemini API.

Explore the Gemini API documentation for video generation or the Veo cookbook and start building today!



Prompts

1: The sneaker on the billboard suddenly springs to life, its laces tying themselves. It leaps off the screen, landing on the rooftop below with a soft thud, and sprints out of frame. Audio: The sound of tying laces, a digital whoosh, a soft landing sound.

2: The mountain logo on the tote bag subtly animates. The sun in the logo rises along the mountain peak, and tiny birds fly out from it. Audio: A gentle whoosh and a soft bird chirp sound effect.