Google introduces Gemini Omni for conversational video generation and editing

Google introduced Gemini Omni, a new multimodal generative model family that starts with video creation and editing from text, image, audio, and video inputs.

The first model, Gemini Omni Flash, is rolling out through the Gemini app, Google Flow, YouTube Shorts Remix, and the YouTube Create app. Google says it is available to Google AI Plus, Pro, and Ultra subscribers globally through Gemini and Flow, and at no cost to users 18 and older in YouTube Shorts Remix and YouTube Create.

Omni is designed for conversational video editing. Google says users can upload or reference existing media, then change a scene through natural-language instructions, including changing environments, adding objects or characters, adjusting camera angles, and refining edits across multiple turns while preserving scene context.

Google says Omni combines Gemini’s reasoning and world knowledge with generative media models. The company says the model has improved understanding of physical concepts such as gravity, kinetic energy, and fluid dynamics, and can use cultural, historical, and scientific context when generating scenes.

The initial rollout focuses on video outputs. Google says future Omni models will support additional output modalities, including image and audio. The company also says videos created with Omni include SynthID digital watermarking and can be verified through the Gemini app, Gemini in Chrome, and Search.

Omni is also being integrated with Google Flow for creative workflows and YouTube Shorts Remix for user-generated video edits.

Source: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni/

More From This Day