Google I/O | New Launches

New launches

Veo: Text-to-Video Generator

Veo is Google’s most capable model for generating high-definition videos.

It takes text-based prompts and turns them into 1080p resolution videos that can be longer than a minute.

Veo combines an advanced understanding of natural language and visual semantics to accurately capture a user’s creative vision. It understands cinematic terms like “timelapse” or “aerial shots of a landscape.”

Veo provides an unprecedented level of creative control, ensuring that people, animals, and objects move realistically throughout shots.

Currently, Veo is available only for select creators as a private preview within VideoFX, but Google plans to release it more widely in the future.

Veo directly competes with OpenAI’s Sora, another text-to-video generator.

Imagen 3

Google’s highest quality text-to-image model. It creates photorealistic images from input text.

Imagen 3 builds on the power of large transformer language models (such as T5) for understanding text and combines it with diffusion models for high-fidelity image generation.

Imagen 3 achieves a new state-of-the-art FID score of 7.27 on the COCO dataset without ever training on COCO. Human raters find Imagen samples to be on par with the COCO data itself in image-text alignment.

To assess text-to-image models, Google introduced DrawBench, a comprehensive benchmark.

These tools represent significant advancements in generative AI, and they’re designed to empower creators and enhance the creative process.