This week, OpenAI shared a series of videos it says were produced with a new tool called Sora. Like Dall-e, OpenAI’s image-generation software, Sora can respond to prompts written in plain language.
Here’s a trailer for a movie that doesn’t exist:
Most of the major AI firms have been working on text-to-video and video-editing tools for years, and a few — mostly smaller start-ups — have released software that people can use right now. The latest version of Runway, for example, probably represented the state of the art up until yesterday, and is capable of producing short clips from text prompts:
Sora — which, again, is not yet available to the public, so keep in mind we’re looking at media released by OpenAI — seems like a pretty major jump forward, in terms of realism, prompt interpretation, video length, and versatility.
As with a lot of AI-generated media, it isn’t hard to find surreal flaws:
But what’s notable about these videos isn’t just how they look at first glance — given the current state of AI image generation, it wasn’t hard to imagine that AI videos would soon follow — but how they move, and how objects within them seem to interact with the world around them. AI image generators have tended to be pretty good at rendering visual approximations of their subjects, but lack context to an extent that their outputs can sometimes become absurd: physically impossible architecture; hands with way too many fingers; bicycles that don’t make mechanical sense. Video generators extend these shortcomings into motion, producing impressionistic videos that look as though they were animated by someone without basic spatial awareness or an intuitive sense of physics. So far, they’ve been pretty adept at rendering environments and scenes but pretty poor at representing motion, object permanence, and mechanical systems. OpenAI claims it’s figured out a way to begin to deal with this:
Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.
OpenAI provides a bunch of examples of what this means in its research post on Sora. Some of the videos suggest a rudimentary sense of space and physics. In some cases, clothing moves and wrinkles with the person wearing it; in others, objects and figures cast shadows. Many, like the floating-chair video above, or a GoPro-style clip of a mountain biker going down a trail — a situation in which many bodies and objects are interacting in very specific and strange ways — are far less convincing, and all remain uncanny in motion (slow motion seems to be an important trick for conveying realism in these demos, which makes sense).
OpenAI also demonstrated a few types of video editing — which has some fairly obvious commercial applications right out of the gate:
This is, again, a selective demo. We’re told a bit about how Sora works. We don’t know much about how (or on what) it was trained, or where that training data came from. We don’t know what it might look like as a final product, or how much it might cost to use. Nor do we have a great sense of what people will want from such tools: More than a year after AI-generated static imagery became available to anyone who wants it, its role in the world remains largely undetermined (advertisers love it, of course, but so do revenge pornographers). OpenAI says it’s “red-teaming” the tech right now, suggesting that, like Dall-e’s image-generation tools, Sora’s will have a number of boundaries to prevent misuse — whatever that might mean — and to minimize bad PR. Some folks are worried about what this means for, say, the film industry, while others imagine that passable text-to-video content is potentially disastrous in the context of, say, an election in an already degraded information environment.
AI influencer and professor Ethan Mollick makes a good point, though, about how quickly these things become unremarkable:
In terms of model output, OpenAI remains stubbornly ahead of other companies working on this stuff. But while these videos are a controlled demo of a single company’s product, it’s reasonable to assume that within a pretty short time, other companies — and open-source projects, or governments — will be able to achieve similar results, with their own priorities in mind.