Wednesday, September 3, 2025

Rethinking A.I.

The Fever Dream of Imminent ‘Superintelligence’ Is Finally Breaking

GPT-5, OpenAI’s latest artificial intelligence system, was supposed to be a game-changer, the culmination of billions of dollars of investment and nearly three years of work. Sam Altman, the company’s chief executive, implied that GPT-5 could be tantamount to artificial general intelligence, or A.G.I. — A.I. that is as smart and as flexible as any human expert.

Instead, as I have written, the model fell short. Within hours of its release, critics found all kinds of baffling errors: It failed some simple math questions, couldn’t count reliably and sometimes provided absurd answers to old riddles. Like its predecessors, the A.I. model still hallucinates (though at a lower rate) and is plagued by questions around its reliability. Although some people have been impressed, few saw it as a quantum leap, and nobody believed it was A.G.I. Many users asked for the old model back.

GPT-5 is a step forward, but nowhere near the A.I. revolution many had expected. That is bad news for the companies and investors who placed substantial bets on the technology. And it demands a rethink of government policies and investments that were built on wildly overinflated expectations. The current strategy of merely making A.I. bigger is deeply flawed — scientifically, economically and politically. Many things from regulation to research strategy must be rethought. One of the keys to this may be training and developing A.I. in ways inspired by the cognitive sciences.

Fundamentally, people like Mr. Altman, the Anthropic chief executive Dario Amodei and countless other tech leaders and investors had put far too much faith into a speculative and unproven hypothesis called scaling: the idea that training A.I. models on ever more data using ever more hardware would eventually lead to A.G.I., or even a “superintelligence” that surpasses humans.

However, as I warned in a 2022 essay titled “Deep Learning Is Hitting a Wall,” so-called scaling laws aren’t physical laws of the universe like gravity, but hypotheses based on historical trends. Large language models, which power systems like GPT-5, are nothing more than souped-up statistical regurgitation machines, so they will continue to stumble into problems around truth, hallucinations and reasoning. Scaling would not bring us to the holy grail of A.G.I.

Many in the tech industry were hostile to my predictions. Mr. Altman ridiculed me as a “mediocre deep learning skeptic” and last year claimed “there is no wall.” Elon Musk shared a meme lampooning my essay.

It now seems I was right. Adding more data to large language models, which are trained to produce text by learning from vast databases of human text, helps them improve only to a degree. Even significantly scaled, they still don’t fully understand the concepts they are exposed to — which is why they sometimes botch answers or generate ridiculously incorrect drawings.

Scaling worked for a while — previous generations of GPT models made impressive advancements to their predecessors. But luck started to run out over the last year. Mr. Musk’s A.I. system, Grok 4, released in July, had 100 times as much training as Grok 2 had but it was only moderately better. Meta’s jumbo-size Llama 4 model, much larger than its predecessor, was mostly also viewed as a failure. As many now see, GPT-5 shows decisively that scaling has lost steam.

The chances of A.G.I.’s arrival by 2027 now seem remote. The government has let A.I. companies lead a charmed life with almost zero regulation. It now ought to enact legislation that addresses costs and harms unfairly offloaded onto the public — from misinformation to deepfakes, “A.I. slop” content, cybercrime, copyright infringement, mental health and energy usage.

Moreover, governments and investors should strongly support research investments outside of scaling. The cognitive sciences (including psychology, child development, philosophy of mind and linguistics) teach us that intelligence is about more than mere statistical mimicry and suggest three promising ideas for developing A.I. that is reliable enough to be trustworthy, with a much richer intelligence.

by Gary Marcus, NY Times |  Read more:
Image: Maria Mavropoulou/Getty
[ed. See also: GPT-5: Overdue, overhyped and underwhelming. And that’s not the worst of it. (MoAI):]
***
"The real news is a breaking study from Arizona State University that fully vindicates what I have told you for nearly 30 years—and more recently what Apple told you—about the core weakness of LLMs: their inability to generalize broadly. (...)

And, crucially, the failure to generalize adequately outside distribution tells us why all the dozens of shots on goal at building “GPT-5 level models” keep missing their target. It’s not an accident. That failing is principled.

That’s exactly what it means to hit a wall, and exactly the particular set of obstacles I described in my most notorious (and prescient) paper, in 2022. Real progress on some dimensions, but stuck in place on others.

Ultimately, the idea that scaling alone might get us to AGI is a hypothesis.

No hypothesis has ever been given more benefit of the doubt, nor more funding. After half a trillion dollars in that direction, it is obviously time to move on. The disappointing performance of GPT-5 should make that enormously clear."