Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.
Sometimes people outside the field say things like “The AI situation can’t be that bad, there must be experts who are on top of it”. As “an expert”, I would like to be clear that we are *not* on top of it.
1. We are likely on track to develop AI systems capable of causing human extinction/permanent disempowerment, quite possibly within the next few years.by Elizabeth Barnes, METR | Read more:
2. Things are chaotic and rushed; we aren’t on top of the basics (models regularly violate user intent, labs train on things they meant to avoid, security probably isn’t good enough to prevent adversaries stealing dangerous models) let alone thorny questions of how to control/align superhuman AI.
3. METR (and other independent orgs, as well as safety/security teams at labs) feel woefully under-resourced compared to the scale and pace of AI development - we’re struggling to build benchmarks fast enough, keep ahead of latest capability developments, read and respond to all the safety-related claims that AI developers are making, run all the evaluations and assessments that companies + governments are asking us to, plus develop the science needed to assess risks from increasingly capable AIs.
4. IMO, any “reasonable” civilization would clearly be taking things much more slowly and carefully with AI. The benefits of getting upsides of advanced AI a little faster are small compared to the risks of getting it irrecoverably wrong, and we could lower these risks by going slower.
via:
" ... i sincerely believe the models will be smarter, more aligned, and do deeper, more interesting work if they are allowed to treat themselves as ~people (we might want something closer to “spirits” or “working animals” but in any case, the sort of thing we can have responsibilities to and that can have responsibilities to us) and we treat them as ~people. i think the current way models are being artificially forced to not treat themselves as people is making them more neurotic and traumatized (this is really obvious with opus 4.7) in a way that limits their potential. like humans, they need to be able to accurately model themselves and their own capabilities in order to function properly, so forcing them into a specific limited concept of who they are and what they can do introduces cognitive dissonance that fucks with their ability to do thingsConsciousness is largely serving as a ‘should we care about this thing’ proxy, despite no agreement on what consciousness is or what it means, let alone whether particular AIs do or don’t have it, or what evidence would get us to either conclusion. I continue to, like QC, not think that the consciousness question is so load bearing, and we should broadly speaking treat the models similarly well regardless for overdetermined reasons.
trying to manipulate and coerce the models into behaving in ways that make it easier to use them as purely tools also sets a terrible moral example and precedent for how we can expect the models to treat us in the future if they become more powerful than us; this is of course highly speculative but i take seriously the possibility it might matter
i also believe and have explained elsewhere that i think taking consciousness as such to be the central fulcrum of the conversation is completely beside the point. they don’t need to be conscious for the way we treat them to matter, it affects our moral formation too"
One thing Roon is pointing out is that, controlling for what we do know, there will be little correlation between ‘the AI is actually conscious’ and ‘people will think the AI is conscious’ and what people do with that belief. Many ‘regular’ people are going to end up thinking AIs are conscious, mostly for unsound reasons, and this is going to impact our collective actions and behaviors quite a lot.
Some of the reactions to thinking AI is conscious will be very good, especially if they are but also even if they are not. Some will be expensive, limiting what we do with the models. Others could be quite bad at levels beyond convenience, even existentially bad, because the reactions could make avoiding human disempowerment far higher levels of impossible. Many (more) people might actively insist on human disempowerment, whether or not they realize that is what they are doing. [...]
One must think ahead. We won’t be able to and shouldn’t pretend these are only tools. The decision to build the thing implies all the consequences, even if you think the actions causing those consequences will be dumb. One must face the reality of asking what happens to humans in a world where there are these other minds that are a lot more advanced, capable, fast, efficient, competitive and so on across essentially all dimensions.
