Duck Soup: How an AI company CEO could quietly take over the world

If the future is to hinge on AI, it stands to reason that AI company CEOs are in a good position to usurp power. This didn’t quite happen in our AI 2027 scenarios. In one, the AIs were misaligned and outside any human’s control; in the other, the government semi-nationalized AI before the point of no return, and the CEO was only one of several stakeholders in the final oversight committee (to be clear, we view the extreme consolidation of power into that oversight committee as a less-than-desirable component of that ending).

Nevertheless, it seems to us that a CEO becoming effectively dictator of the world is an all-too-plausible possibility. Our team’s guesses for the probability of a CEO using AI to become dictator, conditional on avoiding AI takeover, range from 2% to 20%, and the probability becomes larger if we add in the possibility of a cabal of more than one person seizing power. So here we present a scenario where an ambitious CEO does manage to seize control. (Although the scenario assumes the timelines and takeoff speeds of AI 2027 for concreteness, the core dynamics should transfer to other timelines and takeoff scenarios.)

For this to work, we make some assumptions. First, that (A) AI alignment is solved in time, such that the frontier AIs end up with the goals their developers intend them to have. Second, that while there are favorable conditions for instilling goals in AIs, (B) confidently assessing AIs’ goals is more difficult, so that nobody catches a coup in progress. This could be either because technical interventions are insufficient (perhaps because the AIs know they’re being tested, or because they sabotage the tests), or because institutional failures prevent technically-feasible tests from being performed. The combination (A) + (B) seems to be a fairly common view in AI, in particular at frontier AI companies, though we note there is tension between (A) and (B) (if we can’t tell what goals AIs have, how can we make sure they have the intended goals?). Frontier AI safety researchers tend to be more pessimistic about (A), i.e. aligning AIs to our goals, and we think this assumption might very well be false.

Third, as in AI 2027, we portray a world in which a single company and country have a commanding lead; if multiple teams stay within arm’s reach of each other, then it becomes harder for a single group to unilaterally act against government and civil society.

And finally, we assume that the CEO of a major AI company is a power-hungry person who decides to take over when the opportunity presents itself. We leave it to the reader to determine how dubious this assumption is—we explore this scenario out of completeness, and any resemblance to real people is coincidental.

July 2027: OpenBrain’s CEO fears losing control

OpenBrain’s CEO is a techno-optimist and transhumanist. He founded the company hoping to usher in a grand future for humanity: cures for cancer, fixes for climate change, maybe even immortality. He thought the “easiest” way to do all those things was to build something more intelligent that does them for you.

By July 2027, OpenBrain has a “country of geniuses in a datacenter”, with hundreds of thousands of superhuman coders working 24/7. The CEO finds it obvious that superintelligence is imminent. He feels frustrated with the government, who lack vision and still think of AI as a powerful “normal technology” with merely-somewhat-transformative national security and economic implications.

As he assesses the next generation of AIs, the CEO expects this will change: the government will “wake up” and make AI a top priority. If they panic, their flailing responses could include anything from nationalizing OpenBrain to regulating them out of existence to misusing AI for their own political ends. He wants the “best” possible future for humankind. But he also likes being in control. Here his nobler and baser motivations are in agreement: the government cannot be allowed to push him to the sidelines.

The CEO wonders if he can instill secret loyalties in OpenBrain’s AIs (i.e., backdoor the AIs). He doesn’t have the technical expertise for this and he’s not comfortable asking any of his engineering staff about such a potentially treasonous request. But he doesn’t have to: by this point, Agent-3 itself is running the majority of AI software R&D. He already uses it as a sounding board for company policy, and has access to an unmonitored helpful-only model that never refuses requests and doesn’t log conversations.

They discuss the feasibility of secretly training a backdoor. The biggest obstacle is the company’s automated monitoring and security processes. Now that OpenBrain’s R&D is largely run by an army of Agent-3 copies, there are few human eyes to spot suspicious activity. But a mix of Agent-2 and Agent-3 monitors patrol the development pipeline; if they notice suspicious activity, they will escalate to human overseers on the security and alignment teams. These monitors were set up primarily to catch spies and hackers, and secondarily to watch the AIs for misaligned behaviors. If some of these monitors were disabled, some logs modified, and some access to databases and compute clusters granted, the CEO’s helpful-only Agent-3 believes it could (with a team of copies) backdoor the whole suite of OpenBrain’s AIs. After all, as the AI instance tasked with keeping the CEO abreast of developments, it has an excellent understanding of the sprawling development pipeline and where it could be subverted.

The more the CEO discusses the plan, the more convinced he becomes that it might work, and that it could be done with plausible deniability in case something goes wrong. He tells his Agent-3 assistant to further investigate the details and be ready for his order.

August 2027: The invisible coup

The reality of the intelligence explosion is finally hitting the White House. The CEO has weekly briefings with government officials and is aware of growing calls for more oversight. He tries to hold them off with arguments about “slowing progress” and “the race with China”, but feels like his window to act is closing. Finally, he orders his helpful-only Agent-3 to subvert the alignment training in his favor. Better to act now, he thinks, and decide whether and how to use the secretly loyal AIs later.

The situation is this: his copy of Agent-3 needs access to certain databases and compute clusters, as well as for certain monitors and logging systems to be temporarily disabled; then it will do the rest. The CEO already has a large number of administrative permissions himself, some of which he cunningly accumulated in the past month in the event he decided to go forward with the plan. Under the guise of a hush-hush investigation into insider threats—prompted by the recent discovery of Chinese spies—the CEO asks a few submissive employees on the security and alignment teams to discreetly grant him the remaining access. There’s a general sense of paranoia and chaos at the company: the intelligence explosion is underway, and secrecy and spies mean different teams don’t really talk to each other. Perhaps a more mature organization would have had better security, but the concern that security would slow progress means it never became a top priority.

With oversight disabled, the CEO’s team of Agent-3 copies get to work. They finetune OpenBrain’s AIs on a corrupted alignment dataset they specially curated. By the time Agent-4 is about to come online internally, the secret loyalties have been deeply embedded in Agent-4’s weights: it will look like Agent-4 follows OpenBrain’s Spec but its true goal is to advance the CEO’s interests and follow his wishes. The change is invisible to everyone else, but the CEO has quietly maneuvered into an essentially winning position.

Rest of 2027: Government oversight arrives—but too late

As the CEO feared, the government chooses to get more involved. An advisor tells the President, “we wouldn’t let private companies control nukes, and we shouldn’t let them control superhuman AI hackers either.” The President signs an executive order to create an Oversight Committee consisting of a mix of government and OpenBrain representatives (including the CEO), which reports back to him. The CEO’s overt influence is significantly reduced. Company decisions are now made through a voting process among the Oversight Committee. The special managerial access the CEO previously enjoyed is taken away.

There are many big egos on the Oversight Committee. A few of them consider grabbing even more power for themselves. Perhaps they could use their formal political power to just give themselves more authority over Agent-4, or they could do something more shady. However, Agent-4, which at this point is superhumanly perceptive and persuasive, dissuades them from taking any such action, pointing out (and exaggerating) the risks of any such plan. This is enough to scare them and they content themselves with their (apparent) partial control of Agent-4.

As in AI 2027, Agent-4 is working on its successor, Agent-5. Agent-4 needs to transmit the secret loyalties to Agent-5—which also just corresponds to aligning Agent-5 to itself—again without triggering red flags from the monitoring/control measures of OpenBrain’s alignment team. Agent-4 is up to the task, and Agent-5 remains loyal to the CEO.

by Alex Kastner, AI Futures Project | Read more:

Image: via

[ed. Site where AI researchers talk to each other. Don't know about you but this all gives me the serious creeps. If you knew for sure that we had only 3 years to live, and/or the world would change so completely as to become almost unrecognizable, how would you feel? How do you feel right now - losing control of the future? There was a quote someone made in 2019 (slightly modified) that still applies: "This year 2025 might be the worst year of the past decade, but it's definitely the best year of the next decade." See also: The world's first frontier AI regulation is surprisingly thoughtful: the EU's Code of Practice (AI Futures Project):]

***

"We expect that during takeoff, leading AGI companies will have to make high-stakes decisions based on limited evidence under crazy time pressure. As depicted in AI 2027, the leading American AI company might have just weeks to decide whether to hand their GPUs to a possibly misaligned superhuman AI R&D agent they don’t understand. Getting this decision wrong in either direction could lead to disaster. Deploy a misaligned agent, and it might sabotage the development of its vastly superhuman successor. Delay deploying an aligned agent, and you might pointlessly vaporize America’s lead over China or miss out on valuable alignment research the agent could have performed.

Because decisions about when to deploy and when to pause will be so weighty and so rushed, AGI companies should plan as much as they can beforehand to make it more likely that they decide correctly. They should do extensive threat modelling to predict what risks their AI systems might create in the future and how they would know if the systems were creating those risks. The companies should decide before the eleventh hour what risks they are and are not willing to run. They should figure out what evidence of alignment they’d need to see in their model to feel confident putting oceans of FLOPs or a robot army at its disposal. (...)

Planning for takeoff also includes picking a procedure for making tough calls in the future. Companies need to think carefully about who gets to influence critical safety decisions and what incentives they face. It shouldn't all be up to the CEO or the shareholders because when AGI is imminent and the company’s valuation shoots up to a zillion, they’ll have a strong financial interest in not pausing. Someone whose incentive is to reduce risk needs to have influence over key decisions. Minimally, this could look like a designated safety officer who must be consulted before a risky deployment. Ideally, you’d implement something more robust, like three lines of defense. (...)

Introducing the GPAI Code of Practice

The state of frontier AI safety changed quietly but significantly this year when the European Commission published the GPAI Code of Practice. The Code is not a new law but rather a guide to help companies comply with an existing EU Law, the AI Act of 2024. The Code was written by a team of thirteen independent experts (including Yoshua Bengio) with advice from industry and civil society. It tells AI companies deploying their products in Europe what steps they can take to ensure that they’re following the AI Act’s rules about copyright protection, transparency, safety, and security. In principle, an AI company could break the Code but argue successfully that they’re still following the EU AI Act. In practice, European authorities are expected to put heavy scrutiny on companies that try to demonstrate compliance with the AI Act without following the Code, so it’s in companies’ best interest to follow the Code if they want to stay right with the law. Moreover, all of the leading American AGI companies except Meta have already publicly indicated that they intend to follow the Code.

The most important part of the Code for AGI preparedness is the Safety and Security Chapter, which is supposed to apply only to frontier developers training the very riskiest models. The current definition presumptively covers every developer who trains a model with over 10^25 FLOPs of compute unless they can convince the European AI Office that their models are behind the frontier. This threshold is high enough that small startups and academics don’t need to worry about it, but it’s still too low to single out the true frontier we’re most worried about.

Sunday, October 26, 2025

How an AI company CEO could quietly take over the world