Duck Soup: The Single Most Important Thing

Is It Time for a Pause?

Many of the people building powerful AI systems think they’ll stumble on an AI system that forever changes our world fairly soon — three years, five years. I think they’re reasonably likely to be wrong about that, but I’m not sure they’re wrong about that. If we give them fifteen or twenty years, I start to suspect that they are entirely right.

And while I think that the enormous, terrifying challenges of making AI go well are very much solvable, it feels very possible, to me, that we won’t solve them in time.

It’s hard to overstate how much we have to gain from getting this right. It’s also hard to overstate how much we have to lose from getting it wrong. When I’m feeling optimistic about having grandchildren, I imagine that our grandchildren will look back in horror at how recklessly we endangered everyone in the world. And I’m much much more optimistic that humanity will figure this whole situation out in the end if we have twenty years than I am if we have five.

There’s all kinds of AI research being done — at labs, in academia, at nonprofits, and in a distributed fashion all across the internet — that’s so diffuse and varied that it would be hard to ‘slow down’ by fiat. But there’s one kind of AI research — training much larger, much more powerful language models — that it might make sense to try to slow down. If we could agree to hold off on training ever more powerful new models, we might buy more time to do AI alignment research on the models we have. This extra research could make it less likely that misaligned AI eventually seizes control from humans.

An open letter released on Wednesday, with signatures from Elon Musk, Apple co-founder Steve Wozniak, leading AI researcher Yoshua Bengio, and many other prominent figures, called for a six-month moratorium on training bigger, more dangerous ML models:

We call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4. This pause should be public and verifiable, and include all key actors. If such a pause cannot be enacted quickly, governments should step in and institute a moratorium.

I tend to think that we are developing and releasing AI systems much faster and much more carelessly than is in our interests. And from talking to people in Silicon Valley and policymakers in DC, I think efforts to change that are rapidly gaining traction. “We should slow down AI capabilities progress” is a much more mainstream view than it was six months ago, and to me that seems like great news.

In my ideal world, we absolutely would be pausing after the release of GPT-4. People have been speculating about the alignment problem for decades, but this moment is an obvious golden age for alignment work. We finally have models powerful enough to do useful empirical work on understanding them, changing their behavior, evaluating their capabilities, noticing when they’re being deceptive or manipulative, and so on. There are so many open questions in alignment that I expect we can make a lot of progress on in five years, with the benefit of what we’ve learned from existing models. We’d be in a much better position if we could collectively slow down to give ourselves more time to do this work, and I hope we find a way to do that intelligently and effectively. As I’ve said above, I think the stakes are unfathomable, and it’s exciting to see the early stages of coordination to change our current, unwise trajectory.

On the letter itself, though, I have a bunch of uncertainties around whether a six month pause right now would actually help. (I suspect many of the letter-signatories share these uncertainties, and I don’t have strong opinions about the wisdom of signing it). Here are some of my worries:

Is it better to ask for evaluations rather than a pause? Personally, I think labs should sign on to ongoing commitments to subject each new generation of model to a third-party dangerous capabilities audit. I’m much more excited about requiring audits and oversight before training dangerous models than about asking for ‘pauses’, which are hard to enforce.

Is the ask too small? I think I and the letter signers would generally agree that the ideal thing for society to do right now is something more continuous and iterative (and ultimately more ambitious) than a one-time six month pause at this stage. That means one big question is whether this opens the door to those larger efforts, or muddies the waters. Do steps that are in the right direction, but not sufficient, help us collectively produce common knowledge of the problem and build towards the right longer-term solutions, or do they mostly leave people misled about what it’s going to take to solve the problem? I’m not sure.

What will we use the pause to do? An open letter like this one could be a step towards cooperative agreements on evaluations, standards, and governance, in which case it’s great. It could also go badly, if in six months labs go right back to developing powerful models and people walk away with the impression the pause was performative or meaningless. By itself, taking a few months off doesn’t gain us much (especially if a pause is entirely voluntary, so the least cooperative actors can simply ignore it). If we use that time well, to set up binding standards, good evaluations of whether our models are dangerous, and a much larger national conversation about what’s at stake here, then that could change everything.

Does this ask impact companies unevenly? This specific call — to not train models larger than GPT-4 — is inapplicable to almost every AI lab today, because most of them can’t train models larger than GPT-4 in the next six months anyway. OpenAI may well be the only AI lab in a position to act on, or not act on, this demand.

That doesn’t delight me. Obviously, when regulations are being considered, one of the things companies inevitably do is try to design the regulations to advantage them and disadvantage their competitors. If proposed AI regulations appear to be an obvious grab at commercial rivals, I expect they’ll get less traction.

Moreover, I’m worried that an unevenly applied moratorium might backfire. If OpenAI can’t train GPT-5 for 6 months, other AI labs may use that time to rush to train GPT-4-sized models. That could mean that when the moratorium is lifted, OpenAI feels more pressure to get ahead again and may push for an even larger training run than they were planning originally. This moratorium could end up accomplishing very little except for making competitive dynamics even fiercer.

Overall, I’d prefer a policy that creates costs for all players and is careful to avoid creating potential perverse incentives.

Predicting the details of how future AI development will play out isn’t easy. But my best guess is that we’re facing a marathon, not a sprint. The next generation of language models will be even more powerful and scary than GPT-4, and the generation after that will be even scarier still. In my ideal world, we would pause and reflect and do a lot of safety evaluations, make models slightly bigger, and then pause again and do more reflecting and testing. We would do that over and over again as we inch toward transformative AI.

But we’re not living in an ideal world. The single most important thing we can do is to pause when the next model we train would be powerful enough to obsolete humans entirely, and then take as long as we need to work on AI alignment with the help of our existing models. That means that pausing now is mostly valuable insofar as it helps us build towards the harder, more complicated task of identifying when we might be at the brink and pausing for as long as we need to then. I’m not sure what the impact of this letter will be — it might help, or it might hurt.

by Kelsey Piper, Planned Obsolescence | Read more:

[ed. See also: AI Risk is like Terminator; Stop Saying it's Not (Infinite Clout); and, The case for Terminator analogies (Slow Boring).]

Monday, April 3, 2023

The Single Most Important Thing