One reason we’re producing a scenario alongside our playbook at all—as opposed to presenting our policies only as abstract arguments—is to stress-test them. We think many policy proposals for navigating AGI fall apart under scenario scrutiny—that is, if you try to write down a plausible scenario in which that proposal makes the world better, you will find that it runs into difficulties. The corollary is that scenario scrutiny can improve proposals by revealing their weak points.
To illustrate this process and the types of weak points it can expose, we’re about to give several examples of AI policy proposals and ways they could collapse under scenario scrutiny. These examples are necessarily oversimplified, since we don’t have the space in this blog post to articulate more sophisticated versions, much less subject them to serious scrutiny. But hopefully these simple examples illustrate the idea and motivate readers to subject their own proposals to more concrete examination.
With that in mind, here are some policy weaknesses that scenario scrutiny can unearth:
1. Applause lights. The simplest way that a scenario can improve an abstract proposal is by revealing that it is primarily a content-free appeal to unobjectionable values. Suppose that someone calls for the democratic, multinational development of AGI. This sounds good, but what does it look like in practice? The person who says this might not have much of an idea beyond “democracy good.” Having them try to write down a scenario might reveal this fact and allow them to then fill in the details of their actual proposal.For this last example, you might argue that the scenario under which this policy was scrutinized is not plausible. Maybe your primary threat model is malicious use, in which those who would enforce liability still exist for long enough to make OpenBrain internalize its externalities. Maybe it’s something else. That’s fine! An important part of scenario scrutiny as a practice is that it allows for concrete discussion about which future trajectories are more plausible, in addition to which concrete policies would be best in those futures. However, we worry that many people have a scenario involving race dynamics and misalignment in mind and still suggest things like AI liability.
2. Bad analogies. Some AI policy proposals rely on bad analogies. For example, technological automation has historically led to increased prosperity, with displaced workers settling into new types of jobs created by that automation. Applying this argument to AGI straightforwardly leads to “the government should just do what it has done in previous technological transitions, like re-skilling programs.” However, if you look past the labels and write down a concrete scenario in which general, human-level AI automates all knowledge work… what happens next? Perhaps displaced white-collar workers migrate to blue-collar work or to jobs where it matters that it is specifically done by a human. Are there enough such jobs to absorb these workers? How long does it take the automated researchers to solve robotics and automate the blue-collar work too? What are the incentives of the labs that are renting out AI labor? We think reasoning in this way will reveal ways in which AGI is not like previous technologies, such as that it can also do the jobs that humans are supposed to migrate to, making “re-skilling” a bad proposal.
3. Uninterrogated consequences. Abstract arguments can appeal to incompletely explored concepts or goals. For example, a key part of many AI strategies is “beat China in an AGI race.” However, as Gwern asks,
“Then what? […] You get AGI and you show it off publicly, Xi Jinping blows his stack as he realizes how badly he screwed up strategically and declares a national emergency and the CCP starts racing towards its own AGI in a year, and… then what? What do you do in this 1 year period, while you still enjoy AGI supremacy? You have millions of AGIs which can do… ‘stuff’. What is this stuff?
“Are you going to start massive weaponized hacking to subvert CCP AI programs as much as possible short of nuclear war? Lobby the UN to ban rival AGIs and approve US carrier group air strikes on the Chinese mainland? License it to the CCP to buy them off? Just… do nothing and enjoy 10%+ GDP growth for one year before the rival CCP AGIs all start getting deployed? Do you have any idea at all? If you don’t, what is the point of ‘winning the race’?”
A concrete scenario demands concrete answers to these questions, by requiring you to ask “what happens next?” By default, “win the race” does not.
4. Optimistic assumptions and unfollowed incentives. There are many ways for a policy proposal to secretly rest upon optimistic assumptions, but one particularly important way is that, for no apparent reason, a relevant actor doesn’t follow their incentives. For example, upon proposing an international agreement on AI safety, you might forget that the countries—which would be racing to AGI by default—are probably looking for ways to break out of it! A useful frame here is to ask: “Is the world in equilibrium?” That is, has every actor already taken all actions that best serve their interests, given the actions taken by others and the constraints they face? Asking this question can help shine a spotlight on untaken opportunities and ways that actors could subvert policy goals by following their incentives.
Relatedly, a scenario is readily open to “red-teaming” through “what if?” questions, which can reveal optimistic assumptions and their potential impacts if broken. Such questions could be: What if alignment is significantly harder than I expect? What if the CEO secretly wants to be a dictator? What if timelines are longer and China has time to indigenize the compute supply chain?
5. Inconsistencies. Scenario scrutiny can also reveal inconsistencies, either between different parts of your scenario or between your policies and your predictions. For example, when writing our upcoming scenario, we wanted the U.S. and China to agree to a development pause before either reached the superhuman coder milestone. At this point, we realized a problem: a robust agreement would be much more difficult without verification technology, and much of this technology did not exist yet! We then went back and included an “Operation Warp Speed for Verification” earlier in the story. Concretely writing out our plan changed our current policy priorities and made our scenario more internally consistent.
6. Missing what’s important. Finally, a scenario can show you that your proposed policy doesn’t address the important bits of the problem. Take AI liability for example. Imagine the year is 2027, and things are unfolding as AI 2027 depicts. America’s OpenBrain is internally deploying its Agent-4 system to speed up its AI research by 50x, while simultaneously being unsure if Agent-4 is aligned. Meanwhile, Chinese competitor DeepCent is right on OpenBrain’s heels, with internal models that are only two months behind the frontier. What happens next? If OpenBrain pushes forward with Agent-4, it risks losing control to misaligned AI. If OpenBrain instead shuts down Agent-4, it cripples its capabilities research, thereby ceding the lead to DeepCent and the CCP. Where is liability in this picture? Maybe it prevented some risky public deployments earlier on. But, in this scenario, what happens next isn’t “Thankfully, Congress passed a law in 2026 subjecting frontier AI developers to strict liability, and so…
To this, one might argue that liability isn’t trying to solve race dynamics or misalignment; instead, it solves one chunk of the problem, providing value on the margin as part of a broader policy package. This is also fine! Scenario scrutiny is most useful for “grand plan” proposals. But we still think that marginal policies could benefit from scenario scrutiny.
The general principle is that writing a scenario by asking “what happens next, and is the world in equilibrium?” forces you to be concrete, which can surface various problems that arise from being vague and abstract. If you find you can’t write a scenario in which your proposed policies solve the hard problems, that’s a big red flag.
However, if you can write out a plausible scenario in which your policy is good, this isn’t enough for the policy to be good overall. But it’s a bar that we think proposals should meet.
As an analogy: just because a firm bidding for a construction contract submitted a blueprint of their proposed building, along with a breakdown of the estimated costs and calculations of structural integrity, doesn’t mean you should award them the contract! But it’s reasonable to make this part of the submission requirements, precisely because it allows you to more easily separate the wheat from the chaff and identify unrealistic plans. Given that plans for the future of AI are—to put it mildly—more important than plans for individual buildings, we think that scenario scrutiny is a reasonable standard to meet.
While we think that scenario scrutiny is underrated in policy, there are a few costs to consider:
by Joshua Turner and Daniel Kokotajlo, AI Futures Project | Read more:
Image: via