Wednesday, March 1, 2023

OpenAI's "Planning For AGI And Beyond"

Planning For AGI And Beyond

Imagine ExxonMobil releases a statement on climate change. It’s a great statement! They talk about how preventing climate change is their core value. They say that they’ve talked to all the world’s top environmental activists at length, listened to what they had to say, and plan to follow exactly the path they recommend. So (they promise) in the future, when climate change starts to be a real threat, they’ll do everything environmentalists want, in the most careful and responsible way possible. They even put in firm commitments that people can hold them to.

An environmentalist, reading this statement, might have thoughts like:
  • Wow, this is so nice, they didn’t have to do this.
  • I feel really heard right now!
  • They clearly did their homework, talked to leading environmentalists, and absorbed a lot of what they had to say. What a nice gesture!
  • And they used all the right phrases and hit all the right beats!
  • The commitments seem well thought out, and make this extra trustworthy.
  • But what’s this part about “in the future, when climate change starts to be a real threat”?
  • Is there really a single, easily-noticed point where climate change “becomes a threat”?
  • If so, are we sure that point is still in the future?
  • Even if it is, shouldn’t we start being careful now?
  • Are they just going to keep doing normal oil company stuff until that point?
  • Do they feel bad about having done normal oil company stuff for decades? They don’t seem to be saying anything about that.
  • What possible world-model leads to not feeling bad about doing normal oil company stuff in the past, not planning to stop doing normal oil company stuff in the present, but also planning to do an amazing job getting everything right at some indefinite point in the future?
  • Are they maybe just lying?
  • Even if they’re trying to be honest, will their bottom line bias them towards waiting for some final apocalyptic proof that “now climate change is a crisis”, of a sort that will never happen, so they don’t have to stop pumping oil?
This is how I feel about OpenAI’s new statement, Planning For AGI And Beyond.

OpenAI is the AI company behind ChatGPT and DALL-E. In the past, people (including me) have attacked them for seeming to deprioritize safety. Their CEO, Sam Altman, insists that safety is definitely a priority, and has recently been sending various signals to that effect.

Planning For AGI And Beyond (“AGI” = “artificial general intelligence”, ie human-level AI) is the latest volley in that campaign. It’s very good, in all the ways ExxonMobil’s hypothetical statement above was very good. If they’re trying to fool people, they’re doing a convincing job!

Still, it doesn’t apologize for doing normal AI company stuff in the past, or plan to stop doing normal AI company stuff in the present. It just says that, at some indefinite point when they decide AI is a threat, they’re going to do everything right.

This is more believable when OpenAI says it than when ExxonMobil does. There are real arguments for why an AI company might want to switch from moving fast and breaking things at time t to acting all responsible at time t + 1 . Let’s explore the arguments they make in the document, go over the reasons they’re obviously wrong, then look at the more complicated arguments they might be based off of.

Why Doomers Think OpenAI Is Bad And Should Have Slowed Research A Long Time Ago

OpenAI boosters might object: there’s a disanalogy between the global warming story above and AI capabilities research. Global warming is continuously bad: a temperature increase of 0.5 degrees C is bad, 1.0 degrees is worse, and 1.5 degrees is worse still. AI doesn’t become dangerous until some specific point. GPT-3 didn’t hurt anyone. GPT-4 probably won’t hurt anyone. So why not keep building fun chatbots like these for now, then start worrying later?

Doomers counterargue that the fun chatbots burn timeline.

That is, suppose you have some timeline for when AI becomes dangerous. For example, last year Metaculus thought human-like AI would arrive in 2040, and superintelligence around 2043.


Recent AIs have tried lying to, blackmailing, threatening, and seducing users. AI companies freely admit they can’t really control their AIs, and it seems high-priority to solve that before we get superintelligence. If you think that’s 2043, the people who work on this question (“alignment researchers”) have twenty years to learn to control AI.

Then OpenAI poured money into AI, did ground-breaking research, and advanced the state of the art. That meant that AI progress would speed up, and AI would reach the danger level faster. Now Metaculus expects superintelligence in 2031, not 2043 (although this seems kind of like an over-update), which gives alignment researchers eight years, not twenty.

So the faster companies advance AI research - even by creating fun chatbots that aren’t dangerous themselves - the harder it is for alignment researchers to solve their part of the problem in time.

This is why some AI doomers think of OpenAI as an Exxon-Mobil style villain, even though they’ve promised to change course before the danger period. Imagine an environmentalist group working on research and regulatory changes that would have solar power ready to go in 2045. Then ExxonMobil invents a new kind of super-oil that ensures that, nope, all major cities will be underwater by 2031 now. No matter how nice a statement they put out, you’d probably be pretty mad!

Why OpenAI Thinks Their Research Is Good Now, But Might Be Bad Later

OpenAI understands the argument against burning timeline. But they counterargue that having the AIs speeds up alignment research and all other forms of social adjustment to AI. If we want to prepare for superintelligence - whether solving the technical challenge of alignment, or solving the political challenges of unemployment, misinformation, etc - we can do this better when everything is happening gradually and we’ve got concrete AIs to think about:
We believe we have to continuously learn and adapt by deploying less powerful versions of the technology in order to minimize “one shot to get it right” scenarios […] As we create successively more powerful systems, we want to deploy them and gain experience with operating them in the real world. We believe this is the best way to carefully steward AGI into existence—a gradual transition to a world with AGI is better than a sudden one. We expect powerful AI to make the rate of progress in the world much faster, and we think it’s better to adjust to this incrementally.

A gradual transition gives people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place. It also allows for society and AI to co-evolve, and for people collectively to figure out what they want while the stakes are relatively low.
You might notice that, as written, this argument doesn’t support full-speed-ahead AI research. If you really wanted this kind of gradual release that lets society adjust to less powerful AI, you would do something like this:
  • Release AI #1
  • Wait until society has fully adapted to it, and alignment researchers have learned everything they can from it.
  • Then release AI #2
  • Wait until society has fully adapted to it, and alignment researchers have learned everything they can from it.
  • And so on . . 
Meanwhile, in real life, OpenAI released ChatGPT in late November, helped Microsoft launch the Bing chatbot in February, and plans to announce GPT-4 in a few months. Nobody thinks society has even partially adapted to any of these, or that alignment researchers have done more than begin to study them.

The only sense in which OpenAI supports gradualism is the sense in which they’re not doing lots of research in secret, then releasing it all at once. But there are lots of better plans than either doing that, or going full-speed-ahead.

So what’s OpenAI thinking? I haven’t asked them and I don’t know for sure, but I’ve heard enough debates around this that I have some guesses about the kinds of arguments they’re working off of. I think the longer versions would go something like this:

by Scott Alexander, Astral Codex Ten |  Read more:
Image: Metaculus/Sam Altman et. al uncredited
[ed. See also: How to navigate the AI apocalypse as a sane person (Intrinsic Perspective); and, Microsoft's new AI chatbot has been saying some 'crazy and unhinged things' (NPR).]