Google answers Meta’s “text-to-video AI, the “Make-a-Video,” with its own, the Imagen Video. Researchers at Google Brain, the company’s AI lab, introduced Imagen Video, an AI that can create video clips from text prompts.
The second text-to-video AI comes six months after DALLE-2, a text-to-image generator from OpenAI, and merely a week after Meta announced its “Make-A-Video.”
Google’s Imagen Video can produce videos of 1,280×768 pixels resolution at 24 frames per second of not more than 5.3 seconds. The model takes a description and generates a 16-frame, 3-fps video having 24 x 48-pixel resolution. Then, the system upscales and “predicts” additional frames, producing a 720p video at 24 frames per second.
Google says Imagen Video has a “high degree of controllability” and world knowledge.
“We find Imagen Video not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding,” said Google researchers.
The Imagen Video was trained with an “internal dataset” of 14 million videos and 60 million still images, and the training data further contained another 400 million images from the LAION-400M open dataset.
The team at Imagen Video plans to join the researchers at Phenaki, another text-to-video AI from Google that can turn detailed text prompts into two-minute-plus videos, though with a lower quality.
The demos shared include a video of “Coffee pouring into a cup,” “Wooden figurine surfing on a surfboard in space,” “Balloon full of water exploding in extreme slow motion,” and more.
The second text-to-video AI comes six months after DALLE-2, a text-to-image generator from OpenAI, and merely a week after Meta announced its “Make-A-Video.”
Google’s Imagen Video can produce videos of 1,280×768 pixels resolution at 24 frames per second of not more than 5.3 seconds. The model takes a description and generates a 16-frame, 3-fps video having 24 x 48-pixel resolution. Then, the system upscales and “predicts” additional frames, producing a 720p video at 24 frames per second.
Google says Imagen Video has a “high degree of controllability” and world knowledge.
“We find Imagen Video not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding,” said Google researchers.
The Imagen Video was trained with an “internal dataset” of 14 million videos and 60 million still images, and the training data further contained another 400 million images from the LAION-400M open dataset.
The team at Imagen Video plans to join the researchers at Phenaki, another text-to-video AI from Google that can turn detailed text prompts into two-minute-plus videos, though with a lower quality.
The demos shared include a video of “Coffee pouring into a cup,” “Wooden figurine surfing on a surfboard in space,” “Balloon full of water exploding in extreme slow motion,” and more.
For all the latest Technology News Click Here
For the latest news and updates, follow us on Google News.
Denial of responsibility! NewsBit.us is an automatic aggregator around the global media. All the content are available free on Internet. We have just arranged it in one platform for educational purpose only. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials on our website, please contact us by email – [email protected]. The content will be deleted within 24 hours.