New launches
Veo: Text-to-Video Generator
Veo is Google’s most capable model for generating high-definition videos.
It takes text-based prompts and turns them into 1080p resolution videos that can be longer than a minute.
Veo combines an advanced understanding of natural language and visual semantics to accurately capture a user’s creative vision. It understands cinematic terms like “timelapse” or “aerial shots of a landscape.”
Veo provides an unprecedented level of creative control, ensuring that people, animals, and objects move realistically throughout shots.
Currently, Veo is available only for select creators as a private preview within VideoFX, but Google plans to release it more widely in the future.
Veo directly competes with OpenAI’s Sora, another text-to-video generator.
Imagen 3
Google’s highest quality text-to-image model. It creates photorealistic images from input text.
Imagen 3 builds on the power of large transformer language models (such as T5) for understanding text and combines it with diffusion models for high-fidelity image generation.
Imagen 3 achieves a new state-of-the-art FID score of 7.27 on the COCO dataset without ever training on COCO. Human raters find Imagen samples to be on par with the COCO data itself in image-text alignment.
To assess text-to-image models, Google introduced DrawBench, a comprehensive benchmark.
These tools represent significant advancements in generative AI, and they’re designed to empower creators and enhance the creative process.