Saturday, May 10, 2025

Testing AI's GeoGuessr Genius

Seeing a world in a grain of sand

Some of the more unhinged writing on superintelligence pictures AI doing things that seem like magic. Crossing air gaps to escape its data center. Building nanomachines from simple components. Plowing through physical bottlenecks to revolutionize the economy in months.

More sober thinkers point out that these things might be physically impossible. You can’t do physically impossible things, even if you’re very smart.

No, say the speculators, you don’t understand. Everything is physically impossible when you’re 800 IQ points too dumb to figure it out. A chimp might feel secure that humans couldn’t reach him if he climbed a tree; he could never predict arrows, ladders, chainsaws, or helicopters. What superintelligent strategies lie as far outside our solution set as “use a helicopter” is outside a chimp’s?

Eh, say the sober people. Maybe chimp → human was a one-time gain. Humans aren’t infinitely intelligent. But we might have infinite imagination. We can’t build starships, but we can tell stories about them. If someone much smarter than us built a starship, it wouldn’t be an impossible, magical thing we could never predict. It would just be the sort of thing we’d expect someone much smarter than us to do. Maybe there’s nothing left in the helicopters-to-chimps bin - just a lot of starships that might or might not get built.

The first time I felt like I was getting real evidence on this question - the first time I viscerally felt myself in the chimp’s world, staring at the helicopter - was last week, watching OpenAI’s o3 play GeoGuessr.

GeoGuessr is a game where you have to guess where a random Google Street View picture comes from. For example, here’s a scene from normal human GeoGuessr:


The store sign says “ADULTOS”, which sounds Spanish, and there’s a Spanish-looking church on the left. But the trees look too temperate to be Latin America, so I guessed Spain. Too bad - it was Argentina. Such are the vagaries of playing GeoGuessr as a mere human.

Last week, Kelsey Piper claimed that o3 - OpenAI’s latest ChatGPT model - could achieve seemingly impossible feats in GeoGuessr. She gave it this picture:


…and with no further questions, it determined the exact location (Marina State Beach, Monterey, CA).

How? She linked a transcript where o3 tried to explain its reasoning, but the explanation isn’t very good. It said things like:
Tan sand, medium surf, sparse foredune, U.S.-style kite motif, frequent overcast in winter … Sand hue and grain size match many California state-park beaches. California’s winter marine layer often produces exactly this thick, even gray sky.
Commenters suggested that it was lying. Maybe there was hidden metadata in the image, or o3 remembered where Kelsey lived from previous conversations, or it traced her IP, or it cheated some other way.

I decided to test the limits of this phenomenon. Kelsey kindly shared her monster of a prompt, which she says significantly improves performance:  (...)

 …and I ran it on a set of increasingly impossible pictures.

Here are my security guarantees: the first picture came from Google Street View; all subsequent pictures were my personal old photos which aren’t available online. All pictures were screenshots of the original, copy-pasted into MSPaint and re-saved in order to clear metadata. Only one of the pictures is from within a thousand miles of my current location, so o3 can’t improve performance by tracing my IP or analyzing my past queries. I flipped all pictures horizontally to make matching to Google Street View data harder.

Here are the five pictures. Before reading on, consider doing the exercise yourself - try to guess where each is from - and make your predictions about how the AI will do.


Last chance to guess on your own . . . okay, here we go.

by Scott Alexander, Astral Codex Ten |  Read more:
Images: Kelsey Piper; uncredited
[ed. I'm sure the CIA/NSA/FBI are loving this. See also: Highlights From The Comments On AI Geoguessr (ACX).]