Generative AI is a spam nightmare
In the past few months, a lot of digital ink has been spilled over AI models - and the image it collectively paints is pretty depressing.
So I want to start off by saying how frustrating that is.
Generative AI models - that is, systems that take a prompt and create “novel” content, like a picture or prose - are undeniably cool on the surface. There’s no denying the visceral excitement of throwing a few words at a website and seeing it come back with pseudo-photorealistic image (in the case of Stable Diffusion) or a seemingly coherent conversation (in the case of ChatGPT) in just moments.
There are real reasons why these toys became so instantly captivating that even glacial corporate behemoths like Microsoft began incorporating them into major products in mere months. And even on the smaller, more personal side of things - I’ve been moved by seeing some of my friends, especially trans friends, use these tools to generate “selfies” that seemed to finally show the world how they wanted to be seen. That’s not nothing.
So why is the overall AI scene right now so depressing, instead of just being thrilling?
You can probably guess many of the reasons, from the CEO and founder of Stability AI (which owns Stable Diffusion) all-but-boasting about being unconcerned with whether he was entitled to train his data on the copyrighted content of innumerable artists, to the fact that Open AI (which owns ChatGPT) outsourced the work of stripping toxic data (e.g. graphic and sexual violence) to Kenyan workers that were paid less than $2 an hour for enduring traumatizing work.
There are a myriad of serious ethical reasons to make us uneasy about landscape being cultivated by Stability AI and Open AI specifically. But I won’t be talking about those.
What I want to do, instead, is talk about the conversations around these tools, and what we might expect to see when they are adopted in more areas of our lives.
As the ethical complaints about the specific tools and companies that are currently synonymous with “AI” keep mounting, some people are drawn to defend the concept of generative AI models itself - as though there is an active battle going on that will decide whether this technology should ever exist in any form. And not everyone is approaching the topic from a discursive battleground where lines have already been drawn in the sand; lots of people are coming at this from genuinely curious and well-intentioned places.
So it’s important to push past the superficial thrill of seeing ChatGPT generate a rap version of the Upanishads or whatever, and take a look at what the current tech is actually good at - and maybe make some guesses about where we might see it go.
And while people’s eyes are lighting up with the idea of being able to generate an entire finished movie after just a couple days of throwing prompts at a model, the more likely reality in the short-to-mid term involves a lot more spam.
Ted Chiang has done a truly impressive job exploring and explaining the actual utility of AI chatbots like ChatGPT in his recent piece “ChatGPT is a blurry JPEG of the web”. In any “conversation” with ChatGPT where you’re trying to get information, ChatGPT is functionally just a worse search engine. A lot of the impressiveness comes from the novelty alone: We all grew up with search engines, after all, while having “conversations” with seemingly all-knowing computers is still the stuff of sci-fi media. (Maybe another decade or so after widespread adoption of “smart home” devices that take voice commands, even that novelty will have worn off.)
Outside of using these chatbots to get information, you can of course try to use them to “write” stuff. Indeed, people are already selling books consisting entirely of AI-generated stories, which seems amazing until you actually crack one open and see that existing models aren’t capable of making anything better than an only-somewhat-coherent Reddit post.
On the visual side, things aren’t ultimately that much better.
Production studio and VFX shop Corridor Digital recently released a heady video, baitingly titled “Did We Just Change Animation Forever?”, in which they attempt to use Stable Diffusion to “transform” live-action footage into animated footage. Even with Corridor’s considerable expertise and nontrivial resources, their “demo reel” for this technique is footage where lighting and facial features (like a character’s eye color or hair) change from second to second in a single still shot.
Is this the worst animation ever? No, obviously not. Is it good-studio-quality? No, just as obviously. The fact that the output here is so unimpressive hints at the underlying reality here: this isn’t anything new.
What Corridor Digital did - taking live-action footage and drawing on top of it to create animated footage - is called rotoscoping.
And rotoscoping is over 100 years old.
People have been animating on top of live-action footage to make movies with settings and visual effects beyond their budget for as long as people have been making movies. This isn’t groundbreaking.
What’s new is how fast you can do it.
Because at the end of the day, speed is what’s really impressive about these models.
ChatGPT may be a worse search engine, but its conversational tone gives us the illusion that it has “done the research for us” - that it is doing a human task fast, for free, at the push of a button. So too with using Stable Diffusion to generate a picture of “Vladimir Putin as a Teletubby on trial in the style of SPIDER-MAN: INTO THE SPIDER-VERSE”.
Corridor Digital use a common buzzphrase in their video - they talk about this technology “democratizing” animation. What they are trying to convey is that they believe these technologies will make it easier for for people to make art (like animated videos).
(As an aside: A more accurate term might be “accessibility” rather than “democratization”, but I suspect that the difference in terminology might have to do with the political leanings of the demographics that are currently the loudest proponents of generative AI in the online spaces that set the language of the conversation. “Accessibility” is a talking point of equity-minded, left-leaning politics today; many vocal proponents of AI align themselves more with the right-wing, “anti-woke” politics that you see in, say, Elon Musk fandom.)
But regardless of the term used - “democratization” or “accessibility” - this claim isn’t actually true.
You don’t need Stable Diffusion or a generative adversarial network or anything to make an animated short, as the legions of UNDERTALE fan animations on YouTube testify. Anyone can draw, just like anyone can write, and anyone can act.
Of course, there’s an obvious rebuttal to what I’ve just said: Sure, anyone can draw - but most people suck at it, so it’s not really the same thing, is it?
That rebuttal gets to the heart of what people talking about “democratizing art” are actually saying: Not that everyone should be able to make art, but that everyone should be able to make art that is well-received - i.e., content that is popular.
What they are promising is simply to make it easier for people who want attention to make content, fast.
And that’s not quite as lofty a mission statement, now, is it?
This is, of course, not going to be any more true in a post-Stable-Diffusion-world than it is today. The reception of art (or any content, really) has more to do with the attention economy of online platforms than it has to do with the quality of the art itself. I assure you, there are countless incredible artists desperately trying to eke out a living right now. The real gatekeepers of visibility and attention aren’t people who are begging for commissions on Patreon, it’s the corporations building the algorithms that determine how many fucking ads you have to see before being allowed to see one post by a friend you actually follow.
Even if Stable Diffusion eventually becomes able to draw human hands accurately, what these companies have created are not tools to allow you and me to make feature-length movies in a single weekend (it took the full-time staff at Corridor Digital two months to make their 7-minute rotoscoping experiment) - they’re tools that allow other companies to flood us with even more junk content.
As good a job as Chiang did, I think even pithier summary of this situation was made in - of all places - Shen’s recent comic series about art students:
We all know what’s going to happen when we make tools that allow people to cheaply make an enormous volume of passable content very quickly, right? Our search results pages and social media feeds are already cluttered with that garbage today.
Documentarian Dan Olson recently made a video about a type of grift that’s endemic to the self-publishing industry, where people hire gig workers for pennies to write spam audiobooks on topics that attempt to sucker people into buying them through attention-grabbing buzz topics like NFTs and fad diets. Using ChatGPT to write a book is just going to lead to being able to do that sort of shit much, much faster.
And that is, of course, exactly what we’re starting to see. Clarkesworld Magazine, a leading sci-fi/fantasy publication, has had to stop accepting new submissions when they were flooded with AI-generated spam entries promoted by exactly the sort of grifters that Olson was talking about, as a way to try to make a quick buck.
All of this, of course, is just the tip of the iceberg. The spam generated by individual hustlers is absolutely nothing compared to the scale of output that companies dedicated to this stuff can do. YouTube videos for young children have infamously already been completely overtaken by this stuff, among other arenas. How much more productive do you imagine those content farms will be when it gets both cheaper and faster to make that junk?
Generative AI is sometimes talked about as though it is the inescapable byproduct of human innovation itself, as unstoppable as another other new technology.
But ChatGPT and Stable Diffusion aren’t raw elements that exist in the Earth simply waiting to be dug up. They are the deliberate and purposefully-built products of corporations with vast resources and material interests, and they could have been conceptualized, built, and rolled out in many different ways. They are just as much the product of specific goals and ideologies as an iPhone or a pair of Nikes. If we all drown in a spam flood of Biblical proportions, it will not be because the technology was inevitable and could only have been built in that one specific way - it will be because the individuals and organizations working towards that outcome weren’t stopped.
There are very real and very exciting uses for generative AI technology - even in the realm of art. In a vacuum, we could use this tech for really amazing things, things that are far more groundbreaking than “doing a century-old technique faster and with worse results”.
But we’re not in a vacuum, and the real world isn’t some abstract playground of possibilities - it’s already full of content farms and advertiser-driven media economies and corporations willing to traumatize entire fleets of workforce overseas just to be seen with a shiny new crappy version of what you’re already using.