Our Spreadsheet Overlords

Two years have passed since OpenAI released ChatGPT and the panic set in. Two years of above-the-fold headlines about “AI”—a subaltern specialty topic and the preserve of goofy sci-fi films for some 80 years prior—and two years of confusing, rank speculation about “artificial general intelligence” (AGI), a loosely defined idea of “human-level” yet machinic reasoning. Large Language Models, or LLMs, capture and generate what we have long taken to be an essentially human thing, language, shaking our historical sense of our own species to the core. But their abilities are matched by a lack of intelligence, and even a lack of the consistency we have long expected from computing machines.
As a new surge of AGI talk has taken over the airwaves in the third year of LLMs, a deeply revealing form of Actually Existing AI speaks against the hype: Elon Musk’s Department of Governmental Efficiency, a sloppy, violent-yet-banal attack on the codebase and massive personal data dragnet of the federal government. While we wait for AGI—and while we’re distracted by endless, ungrounded debates about it—the reality of modern AI is parading in plain sight in the form of the most boring constitutional crisis imaginable. Rather than machine intelligence, AI is an avant-garde form of digital bureaucracy, one that deepens our culture’s dependence on the spreadsheet.
The discourse is providing cover for this disastrous attack. Kevin Roose, a tech columnist for the New York Times, recently explained why he’s “feeling the AGI.” (Unfortunately, Roose’s reasons seem to boil down to, “I live in San Francisco.”) Similarly, Ezra Klein, of the paper’s Opinion pages, thinks the government knows AGI is coming. And the statistician Nate Silver suggests we have to “come to grips with AI.” The internet ethnographer and journalist Max Read has dubbed this surge of AI believers the “AI backlash backlash,” a reaction to the anti-tech skepticism we’ve seen over the past few years. The position, according to Read, is that AI “is quite powerful and useful, and even if you hate that, lots of money and resources are being expended on it, so it’s important to take it seriously rather than dismissing it out of hand.” That’s a far cry from the derisive characterization of Large Language Models (LLMs) like ChatGPT as “stochastic parrots” (which remix and repeat human language) or “fancy autocomplete.” These systems are far more capable—and more dangerous—than the skeptics make them out to be. Dispelling the myth of their intelligence does not excuse us from paying close attention to their power.
Rather than providing the much-vaunted innovation and efficiency associated with Silicon Valley, AI systems create more confusion than clarity. They are a coping mechanism for a global society that runs on digital data sets too vast to make sense of, too complex to disentangle manually. Feeding off a staggering amount of digitized data, they are a tool specified to that data and its tabular format. When we think of AI, we should think less of Terminator 2 and more of the TV show Severance, in which office workers search for “bad numbers” on the strength of vibes alone.
An LLM is nothing more than a distilled matrix of values that represent words. The models we are all familiar with now—ChatGPT, Claude, Gemini, Grok—have many moving parts, but their core element is a large set of rows and columns that is the result of billions of dollars in training. The training data are on the order of 6 trillion –to 10 trillion tokens (including words, letters, and other marks like “&,” “-ing,” and “3”)— orders of magnitude more text than humans have ever used for any purpose—and they only exist today because of the planetary sprawl of the internet. Using all this training data, you’ll be able to make a bot that responds to human questions, retrieves information, generates poetry and memos and anything else you like, and effectively feels like magic. You’ll have an AI model that feels like AGI.
If—as happened between early 2023 and late 2024—people stop feeling that magic, you can also then tweak your model. Instead of its just responding to prompts and queries, you can tell it to generate a bunch of responses and then print off its “thoughts” as it chooses the best one. This new model could do fun things, like fill an Instacart order or book a vacation. And those things are what agents do, so—after a new round of training and a new round of VC funding—everyone will be feeling AGI again.
Two tendencies, alike in error, reign over AI discourse today. The one, as Read observes, is that critics deride AI as a tool of capitalism and a con put on by tech oligarchs, failing to explain its power. The other, which I’m going to call “the performance fallacy,” confuses benchmarks for intelligence. Until we move past this pas de deux of shallow analysis, we will not be able to confront the very real problem of AI today.
The Performance Fallacy
In 1950, Alan Turing proposed a simple way to determine if a machine could think: Ask it some questions. If you couldn’t figure out if you were talking to a machine or not, you should concede that it is intelligent. This game became known as the “Turing Test,” and no one, to my knowledge, has ever been satisfied by it. Turing’s idea was that when we decide someone else is intelligent, it’s not that we know this, it’s that we assume it. I don’t ask to see how your brain works to determine if you’re intelligent; I just think of you as a human. The definition of intelligence that comes from this isn’t a definition at all—and that’s why AI has been permanently split between two ways of understanding what Turing meant.
The first way is according to the benchmark. Every new model that gets released today is tested on an endless series of performance thresholds with fancy acronym titles (ARC-AGI, a series of difficult puzzles, is a popular one these days). Each set of benchmark performances is compared to earlier attempts: A new model is said to score 87% where the previous best was 59%, even if no one can tell you what those percentages mean. If OpenAI’s 03 “reasoning” model scores 87% on ARC-AGI, does that mean it is 87% intelligent? Is “87% intelligent” a coherent idea? In the world of pure benchmark culture, such questions don’t matter and can’t really be asked. The system is optimizing for something that looks like what intelligent beings (humans) do, so there’s little reason for skepticism. The most extreme version of this benchmarking is arguably the Loebner Prize, a competition that ran for 30 years and awarded a large sum to the most convincing chatbot. Its benchmark for “intelligence” was taken from an offhanded comment of Turing’s: that a chatbot that fooled a human roughly two-thirds of the time would count as intelligent.
But it’s not clear that Turing really intended for this, or any other, benchmark to determine what intelligence was or who counted as intelligent. In “Computing Machinery and Intelligence,” he concocted several exchanges between himself and a fictional future computer, in which he asked the machine to do math problems, play chess, and compose a poem about the Forth Bridge in Scotland. These transcripts of an imaginary set of conversations—alongside ideas like a machine needing to “enjoy strawberries and cream”—show that Turing was thinking of intelligence holistically. This second way of framing intelligence is negative and, maybe surprisingly, not technical at all. Conversation was the un-benchmarkable threshold. And even though LLMs can’t prove that they can enjoy anything, they can certainly say that they can, and in language that scrambles the very idea of the Turing Test in its benchmark form altogether.
Benchmark culture adds to the vaudeville quality of tech today, with its demos, entertainer personalities, and gimmicks. All of the showmanship claims to be about performance. Your new iPhone is faster, better, stronger. Analytics makes everything from finance to sports better. The idea is that performance is part of a larger whole that adds up to intelligence, or something like it. But getting caught up in that question is what is distracting us from the actual effect of AI. And what if this idea of performance demonstrating substance itself is a fallacy?
Chess was an obsession of AI engineers from the father of information theory, Claude Shannon, down to the moment in 1997 when IBM’s Deep Blue beat the world champion Gary Kasparov. That machine performed what no human can: a chess game close enough to perfect. At that point, we were about four decades into the AI project. Did we decide Deep Blue was an intelligent compatriot for humans on Earth? Of course not. There is no one ability that makes up intelligence. Performance is a poor proxy for intelligence in the first place.
I think we should name the central problem with AGI—and to some extent, AI itself—“the performance fallacy.”
The performance fallacy is when we confuse optimization for intelligence itself. Instead of noticing it, AI engineers—and the media—tend to roll on to a new benchmark. Chess, after all, is highly rigid and logical! But the ancient Chinese game Go would surely require intelligence for elite play—right? When the neural net AlphaGo beat Lee Sedol, one of the world’s best Go players, in 2015, we got a new round of performance-fallacy speculation, hype, panic—but no one thinks AlphaGo is intelligent. Round and round we went, for the better part of a century.
The winter of 2023 felt different, even to those hardened to AI hype. Suddenly, we had a machine that did what Turing had, perhaps accidentally, really suggested would convince us it was intelligent. We had a machine that could produce language and write texts that responded to a human in dialogue. When Socrates decried writing, saying it would destroy the capacity for memory and leave humans cognitively impaired, it was specifically the inability of texts to speak back to their readers, to maintain a dialogue, that he counted against the then-new technology. In the LLMs, we suddenly had a competitor in an arena we had occupied alone forever: language.
Humans have defined themselves as the animal with language almost as much as they have by their intelligence—homo sapiens has to speak to be sapiens at all. Aristotle’s phrase for the human—the “animal with reason”—uses the term “logos,” which means both the spark of intelligence and the word. LLMs seem to take away our singularity in this respect. They are one of the most uncanny forms of technology in our history because they speak back to us. But what the discourse about AGI fails to recognize is that that speech is not necessarily intelligent. Even when an LLM passes the Turing Test, we cannot allow ourselves to believe in the fallacy. As the mathematician Benjamin Recht argues, LLM benchmarks largely fail to make any sense of these machines. This is because the evaluation of language drags us into the whole history of philosophy, into the metaphysics of who we are, and what intelligence is and how it is related to language at all.
It was relatively easy to resist the confusion until it turned out that AI models could capture and generate language. ChatGPT and other LLMs create the uncanny sense that the thing that separates humans from all other beings is no longer ours. It is all the more crucial to see, then, that what is taken for intelligence is instead an unimaginably large cultural document, a record of human beliefs, actions, and communications. The cognitive scientist Alison Gopnik has called LLMs a “cultural technology” for this reason, incapable of being creative in the way that human intelligence is. But much of what happens in the world, our history, and our politics, is cultural in this sense. And if we do not turn our gaze away from the ambition to “create an intelligence” and toward the real—and strange—cultural consequences of the digital bureaucracy, including the condition that AI is as powerful as it is, we will remain irrationally confined in metaphysical subtleties that obscure the actual activities of AI systems. Both the commercial attempts to build machine intelligence and the scientific attempts to model and understand intelligence obscure the fact that AI is an extension—indeed, nearly a universalization—of digital bureaucracy.
AI Is Spreadsheet Culture in Hyperdrive
It’s notable that the new defenders of AGI—Roose, Klein, Silver—work in industries where AI stands to automate or radically change everyday work. Several major journalistic outlets, including The Atlantic, have cut deals with AI companies, and we have seen several examples already of AI-generated articles being passed off as written by humans. And there’s no question that AI will automate many jobs in data analytics out of existence. Silver himself is probably safe—he’s a major book author now—but even those who remain will see major overhauls of workflows as the models we think of primarily as chatbots intervene in the parsing and production of data and its analysis. In fact, all office work, all mental labor, from secretarial and administrative work all the way up to the C-suite, stands to change. That is because work, just like every other aspect of our lives, has been datafied.
When we speak of “data,” we tend to imagine a huge spreadsheet filled with meaningless numbers, something hidden somewhere that requires sophisticated techniques and trained experts to deal with. But every spreadsheet is about something, from your budget to the list of minors and majors I keep for the department at my university, to the prices on the stock market, and so on. The rows and columns must have names; and often enough, we put plain old English right into the boxes, rather than numerical data. Data culture is all about how we go back and forth between what the numbers mean and what the processing of the numbers tells us. Translating between math and language is, in many ways, the basis of modern society.
Think of my list of majors and minors. I cannot use any of Excel’s mathematical functions on these names, email addresses, and expected graduation dates. I use the spreadsheet in this case as a storage unit. The information stored there can’t be understood or interpreted by the spreadsheet functions. (It can be divided by category, like “first names in Row A,” but its meaning cannot be transformed in an informative way.) Only a human can do that. Enter the LLM.
We can think of all the text on the internet as being placed into exactly such a storage unit when an LLM is trained. AI’s new ability—the one we are mistaking for intelligence—is a general capacity to translate between words and data, data and words. But where you, a human, convert the results of any data processing back into real-world consequences, in language, in the past—“the Q4 returns show that we should think about pulling back distribution in the Northeast,” for example—now you have a tool that does that part of the work for you. LLMs literally supply the function of language that bureaucracy requires. We could call them a semantic spreadsheet, a tool that allows us to interact with data, and even create code, with nothing more than a prompt.
It’s no accident that by far the largest data set the internet has produced is text. LLMs today are trained on something like 10 trillion tokens of text, a staggering amount of language. This may or may not perfectly capture what language is, but what it certainly does is make data and language a two-way street. We have long been able to use software like the spreadsheet to manipulate numbers. Suddenly it’s possible to convert the numbers into language and vice versa. That solves a problem that has dogged spreadsheets—and computers themselves—since their invention.
We should not take that ability for intelligence—not commit the performance fallacy once again—because it keeps us distracted as our actual data culture enters a genuinely new, and very worrisome, phase. We could call it spreadsheet culture in hyperdrive, a world in which all data can be translated into summary language and all language into optimized data with nothing more than a prompt. But where spreadsheets had limited functionality, LLMs act as universal translators in the same arena. They have many flaws, but this core capacity is a step-change in the mundane world of modern bureaucracy.
Bureaucracy has always been about managing the relay between data and words, numbers and memos, accounting and accounts. Starting after the French Revolution, the amount of data gathered started to accelerate, necessitating devices that could help bureaucrats with the computing part of their jobs. The philosopher of science Ian Hacking places what he calls the “avalanche of printed numbers” in the 1820s and 1830s, when demographic and industry data took off, exponentially. In a way, the LLM closes this loop, one created by an imbalance of too many numbers and not enough words, not enough understanding. Computers helped us to deal with the numbers but not to interpret them. LLMs give us at least the illusion—and it’s not certain we have any better or clearer way to deal with the sheer amount of semantic and numerical data in the world—of summarizing, understanding, and re-generating from that avalanche, which has since become a flood.
In other words, for the limited but vast world of digital data and the logistical operations we entrust to it, LLMs serve as a kind of off-brand universal language, one that seems magical in spite of lacking any spark of intelligence. They are the only other system that has ever been able to carry out the types of conversations between data types that LLMs do in the human mind—thus the fallacy. Numbers and words have always been the combined channel of modern bureaucracy, and there is potentially deep insight to glean from their new relationship in AI (this is the topic of my forthcoming book on LLMs). But such insight will have to come even as we also tend to the real political consequences of the willy-nilly deployment of this technology we have already allowed.
Control of Bureaucracy is Deeper than Politics
Musk is now bringing his apocalyptic vision to federal databases that contain not just information about citizens and organizations but the power to start and stop payment to them as well. Musk is reportedly stepping back from the project, but this does not change how clearly the agency demonstrates what AI is for. His DOGE initiative is in many ways the clearest demonstration of what AI actually has been to date: a bureaucratic wolf dressed in the sheep’s clothing of innovation and “intelligence.”
As DOGE has gained access to various federal departments, part of its stated goal is to introduce “AI agents” into the systems. As the political scientist Henry Farrell has written about this process, government databases desperately do need to be updated. DOGE is promising “innovation” while Musk crows that he is simply “deleting” whole agencies and sets of payments to his political enemies. This is only possible because of the digital infrastructure we already have, and where it does not exist, DOGE intends to create it.
In the 1980s, when machine learning was just being put on its scientific feet, some of its earliest scions speculated that large databases would be parsable using neural nets. These nets, they said in a toy example, might be able to independently predict which names in a set belong to the “Jets” and which to the “Sharks,” with reference to the gangs in the musical West Side Story. This lighthearted suggestion, which came with a chart, has turned into a nightmarish reality in a world where the analytics company Palantir doesn’t just coordinate names but faces gang membership, recidivism scores, and more. The ability to associate images, text, and data—and to derive predictions about human behavior from them—should be treated with extreme caution. Yet Palantir’s open intent to work with the national security apparatuses of a government that now deports immigrants—and, apparently, citizens—without regard for accuracy suggests that AI’s “intelligence” is a cover story for something far deeper and more dangerous.
Nowhere is this intent clearer than in the recent “hackathon” that Palantir and DOGE collaborated on, with the goal of creating an all-purpose application programming interface (API) for the IRS and its data. Responding to the White House’s demand that the government “eliminate information silos,” the two groups are attempting to create a template for a way to interact with all information about citizens in a single large database—a project that makes the George W. Bush–era NSA project “Total Information Awareness,” already Orwellian, look quaint. The point of creating such an API is to release the true power of the LLM. The AI slop you see on your social media feed isn’t the point; the point, instead, is the ability to say, “Make me a list of all citizens who are Marxists, cut off their payments, and notify ICE of their whereabouts.”
If you take away all the speculative and mystical language about “intelligence” and “reasoning,” it’s easy to see that this type of task is what we would usually call bureaucracy. AI’s power, danger, and limits are all in this banal world of rows and columns. It’s easy to overlook this, because we have spent the last three decades making virtually the whole world into a giant spreadsheet. With everything from your personal daily heartrate variations and financial trends to tics of speech and culture pre-formatted for an AI model, the power of this tool becomes immense. When the Turing Award–winning AI engineer Yann LeCun took to X to describe his somewhat cool expectations for the future of the technology, Musk replied, “Our digital god will be in the form of a csv file,” ironically fusing a common data format (comma-separated values) with the science-fiction notion of an all-powerful AI.
Musk came to Washington promising innovation, but he is delivering a new, unaccountable form of bureaucracy. We cannot afford to continue to believe that anything that “AI” touches will be more efficient, its progress trending toward “smarter” systems. (By this point, we should know that “smart” does not mean “intelligent”; it just means “shiny.”) A general audit is a bureaucratic procedure, one that current AI is too limited to perform well. But so long as we believe the hype, it will sound like a good idea to inject AI models into the databases that contain our health insurance, our Social Security payments, and our tax records.
Among those who thought Trump would cause a constitutional crisis, few predicted it would occur in a form so boring as a spreadsheet. But the truth is that whoever controls the bureaucracy has power beyond even what a democratic process like voting can authorize or undermine. The pretense of handing that control over to the machines provides cover for an all-too-human political act: slashing and burning the federal government using a tool that has unprecedented power to do so.
Rather than the dream of machine intelligence, we are witnessing the disastrous political consequences of a different dream, one of a total bureaucracy that operates in the all-but-invisible interstices of the software we have come to rely on. The way that such a system breaks basic elements of our social contract is a feature, not a bug. AI systems could be used as tools to democratic ends. But that would require that we understand them and treat them as scientific objects first and commercial products second. It will take a massive, principled effort to understand the deep mathematics and cultural-linguistic forms that result to reverse the current trend of AI progress.
Leif Weatherby is the director of the Digital Theory Lab at New York University. His book, Language Machines: Cultural AI and the End of Remainder Humanism, is out next month.