The Generation Generation

A little over two years after OpenAI released ChatGPT upon the world, and about four years since Dall-E, the company’s toolset now — “finally” — makes it possible to generate video. Sora, as it is called, is not the first generative video tool to be released to the public; there are already offerings from Hotshot, Luma, Runway, and Tencent. OpenAI’s is the highest-profile so far, though: the one many people will use, and the products of which we will likely all be exposed to.

A generator of video is naturally best seen demonstrated in that format, and I think Marques Brownlee’s preview is a good place to start. The results are, as I wrote in February when Sora was first shown, undeniably impressive. No matter how complicated my views about generative A.I. — and I will get there — it is bewildering that a computer can, in a matter of seconds, transform noise into a convincing ten-second clip depicting whatever was typed into a text box. It can transform still images into video, too.

It is hard to see this as anything other than extraordinary. Enough has been written by now about “any sufficiently advanced technology [being] indistinguishable from magic” to bore, but this truly captures it in a Penn & Teller kind of way: knowing how it works only makes it somehow more incredible. Feed computers on a vast scale video which has been labelled — partly by people, and partly by automated means which are reliant on this exact same training process — and it can average that into entirely new video that often appears plausible.1 I am basing my assessment on the results generated by others because Sora requires a paid OpenAI account, and because there is currently a waiting list.

There are, of course, limitations of both technology and policy. Sora has problems with physics, the placement of objects in space, and consistency between and within shots. Sora does not generate audio, even though OpenAI has the capability. Prompts in text and images are checked for copyright violations, public figures’ likenesses, criminal usage, and so forth. But there is no meaningful restrictions on the video itself. This is not how things must be; this is a design decision.

I keep thinking about the differences between A.I. features and A.I. products. I use very few A.I. products; an open-ended image generator, for example, is technically interesting but not very useful to me. Unlike a crop of Substack writers, I do not think pretending to have commissioned art lends me any credibility. But I now use A.I. features on a regular basis, in part because so many things are now “A.I. features” in name and by seemingly no other quality. Generative Remove in Adobe Lightroom Classic, for example, has become a terrific part of my creative workflow. There are edits I sometimes want to make which, if not for this feature, would require vastly more time which, depending on the job, I may not have. It is an image generator just like Dall-E or Stable Diffusion, but it is limited by design.

Adobe is not taking a principled stance; Photoshop contains a text-based image generator which, I think, does not benefit from being so open-ended. It would, for me, be improved if its functionality were integrated into more specific tools; for example, the crop tool could also allow generative reframing.

Sora, like ChatGPT and Dall-E, is an A.I. product. But I would find its capabilities more useful and compelling if they were a feature within a broader video editing environment. Its existence implies a set of tools which could benefit a video editor’s workflow. For example, the object removal and tracking features in Premiere Pro feel more useful to me than its ability to generate b-roll, which just seems like a crappy excuse to avoid buying stock footage or paying for a second unit.

Limiting generative A.I. in this manner would also make its products more grounded in reality and less likely to be abused. It would also mean withholding capabilities. Clearly, there are some people who see a demonstration of the power of generative A.I. as a worthwhile endeavour unto itself. As a science experiment, I get it, but I do not think these open-ended tools should be publicly available. Alas, that is not the future venture capitalists, and shareholders, and — I guess — the creators of these products have decided is best for us.

We are now living in a world of slop, and we have been for some time. It began as infinite reams of text-based slop intended to be surfaced in search results. It became image-based slop which paired perfectly with Facebook’s pivot to TikTok-like recommendations. Image slop and audio slop came together to produce image slideshow slop dumped into the pipelines of Instagram Reels, TikTok, YouTube Shorts. Brace yourselves for a torrent of video slop about pyramids and the Bermuda triangle and pyramids. None of these were made using Sora, as far as I know; at least some were generated by Hailuo from Minimax. I had to dig a little bit for these examples, but not too much, and it is only going to get worse.

Much has been written about how all this generative stuff has the capability of manipulating reality — and rightfully so. It lends credence to lies, and its mere existence can cause unwarranted doubt. But there is another problem: all of this makes our world a little bit worse because it is cheap to produce in volume. We are on the receiving end of a bullshit industry, and the toolmakers see no reason to slow it down. Every big platform — including the web itself — is full of this stuff, and it is worse for all of us. Cynicism aside, I cannot imagine the leadership at Google or Meta actually enjoys using their own products as they wade through generated garbage.

This is hitting each of us in similar ways. If you use a computer that is connected to the internet, you are likely running into A.I.-generated stuff all the time, perhaps without being fully aware of it. The recipe you followed, the repair guide you found, the code you copy-and-pasted, and the images in the video you watched? Any of them could have been generated in a data farm somewhere. I do not think that is inherently bad, though it is an uncertain feeling.

I am part of the millennial generation. I grew up at a time in which we were told we were experiencing something brand new in world history. The internet allowed anyone to publish anything, and it was impossible to verify this new flood of information. We were taught to think critically and be cautious, since we never knew who created anything. Now we have a different problem: we are unsure what created anything.


  1. Without thinking about why it is the case, it is interesting how generative A.I. has no problem creating realistic-seeming text as text, but it struggles when it is an image containing text. But with a little knowledge about how these things work, that makes sense. ↥︎