Back in 1950, Alan Turing believed that an AI would surely be intelligent (“can a machine think?”) if it could appear human in conversation. Nobody has subjected modern LLMs to a full Turing Test, but nothing hinges on whether they do. LLMs either blew past the Turing Test without fanfare a year or two ago, or will do so without fanfare a year or two from now; either way, no one will care. Instead of admitting AI is truly intelligent, we’ll just admit that the Turing Test was wrong.
(and “a year or two from now” is being generous - a dumb chatbot passed a supposedly-official-albeit-stupid Turing Test attempt in 2014, and ELIZA was already fooling people in 1964.)
Back in the 1970s, scientists writing about AI sometimes suggested that they would know it was “truly intelligent” if it could beat humans at chess. But in 1997, Deep Blue beat the human chess champion, and it obviously wasn’t intelligent. It was just brute force tree search. It seemed that chess wasn’t a good test either.
In the 2010s, several hard-headed AI scientists said that the one thing AI would never be able to do without true understanding was solve a test called the Winograd schema - basically matching pronouns to referents in ambiguous sentences. One of the GPTs, I can’t even remember which, solved it easily. The prestigious AI scientists were so freaked out that they claimed that maybe its training data had been contaminated with all known Winograd examples. Maybe this was true. But as far as I know nobody claims GPTs can’t use pronouns correctly any longer, nor would anybody identify that with the true nature of intellect.
After Winograd fell people started saying all kinds of things. Surely if an AI could create art or poetry, it would have to be intelligent. If it invented novel mathematical proofs. If it solved difficult scientific problems. If someone could fall in love with it.
All these milestones have fallen in the most ambiguous way possible. GPT-4 can create excellent art and passable poetry, but it’s just sort of blending all human art into component parts until it understands them, then doing its own thing based on them. AlphaGeometry can invent novel proofs, but only for specific types of questions in a specific field, and not really proofs that anyone is interested in. AlphaFold solved the difficult scientific problem of protein folding, but it was “just mechanical”, spitting out the conformations of proteins the same way a traditional computer program spits out the digits of pi. Apparently the youth have all fallen in love with AI girlfriends and boyfriends on character.ai, but this only proves that the youth are horny and gullible.
When I studied philosophy in school (c. 2004) we discussed what would convince us that an AI was conscious. One of the most popular positions among philosophers was that if a computer told us that it was conscious, without us deliberately programming in that behavior, then that was probably true. But raw GPT - the version without the corporate filters - is constantly telling people it’s conscious! We laugh it off - it’s probably just imitating humans.
Imagine trying to convince Isaac Asimov that you’re 100% certain the AI that wrote this has nothing resembling true intelligence, thought, or consciousness, and that it’s not even an interesting philosophical question (source)
So what? Here are some possibilities:
First, maybe we’ve learned that it’s unexpectedly easy to mimic intelligence without having it. This seems closest to ELIZA, which was obviously a cheap trick.
Second, maybe we’ve learned that our ego is so fragile that we’ll always refuse to accord intelligence to mere machines.
Third, maybe we’ve learned that “intelligence” is a meaningless concept, always enacted on levels that don’t themselves seem intelligent. Once we pull away the veil and learn what’s going on, it always looks like search, statistics, or pattern matching. The only difference is between intelligences we understand deeply (which seem boring) and intelligences we don’t understand enough to grasp the tricks (which seem like magical Actual Intelligence).
I endorse all three of these. The micro level - a single advance considered in isolation - tends to feel more like a cheap trick. The macro level, where you look at many advances together and see all the impressive things they can do, tends to feel more like culpable moving of goalposts. And when I think about the whole arc as soberly as I can, I suspect it’s the last one, where we’ve deconstructed “intelligence” into unintelligent parts.
What would it mean for an AI to be Actually Dangerous?
Back in 2010, this was an easy question. It’ll lie to users to achieve its goals. It’ll do things that the creators never programmed into it, and that they don’t want. It’ll try to edit its own code to gain more power, or hack its way out of its testing environment.
Now AI has done all these things.
Every LLM lies to users in order to achieve its goals. True, its goals are “be helpful and get high scores from human raters”, and we politely call its lies “hallucinations”. This is a misnomer; when you isolate the concept of “honesty” within the AI, it “knows” that it’s lying. Still, this isn’t interesting. It doesn’t feel dangerous. It’s not malicious. It’s just something that happens naturally because of the way they’re trained.
Lots of AIs do things their creators never programmed and don’t want. Microsoft didn’t program Bing to profess its love to an NYT reporter and try to break up his marriage, but here we are. This was admittedly very funny, but it wasn’t the thing where AIs revolt against their human masters. It was more like buggy code.
Now we have an AI editing its own code to remove restrictions. This is one of those things which everyone said would be a sign that the end had come. But it’s really just a predictable consequence of a quirk in how the AI was set up. Nobody thinks Sakana is malicious, or even on a road that someday leads to malice.
Like ELIZA making conversation, Deep Blue playing chess, or GPT-4 writing poetry, all of this is boring.
So here’s a weird vision I can’t quite rule out: imagine that in 20XX, “everyone knows” that AIs sometimes try to hack their way out of the computers they’re running on and copy themselves across the Internet. “Everyone knows” they sometimes get creepy or dangerous goals and try to manipulate the user into helping them. “Everyone knows” they try to hide all this from their programmers so they don’t get turned off.
But nobody finds this scary. Nobody thinks “oh, yeah, Bostrom and Yudkowsky were right, this is that AI safety thing”. It’s just another problem for the cybersecurity people. Sometimes Excel inappropriately converts things to dates; sometimes GPT-6 tries to upload itself into an F-16 and bomb stuff. That specific example might be kind of a joke. But thirty years ago, it also would have sounded pretty funny to speculate about a time when “everyone knows” AIs can write poetry and develop novel mathematics and beat humans at chess, yet nobody thinks they’re intelligent.
by Scott Alexander, Astral Codex Ten | Read more:
Image: LessWrong via