II.
We try to explain AI alignment by analogy to human alignment. Evolution “created” humans. Its “goal” is for humans to spread their genes by (approximately) having as many children as possible. It couldn’t directly communicate that goal to humans - partly because it’s an abstract concept that can’t talk, and partly because for most of biological history it was working with lemurs and ape-men who couldn’t understand words anyway. Instead, it tried to give us instincts that align us with that goal. The most relevant instinct is sex: most humans want to have sex, an action that potentially results in pregnancy, childbearing, and genes being spread to the next generation. This alignment strategy succeeded well enough that humans populations remain high as of 2023.
We’ve talked before about a major failure: humans can invent contraception. Evolution’s main alignment strategy was totally unprepared for this. It made us interested in a certain type of genital friction, which was a good proxy for its goal in the ancestral environment. But once we became smarter, we got new out-of-training-distribution options available, and one of those was inventing contraception so that we could get the genital friction without the kids. This is a big part of why average-children-per-couple is declining from 8+ in eg pioneer times to ~1.5 in rich countries today, even though modern rich people have more child-rearing resources available than the pioneers.
Another major alignment failure is porn. Giving evolution a little more credit, it didn’t just make people want genital friction - if that had been the sole imperative, we would have died out as soon as someone inventing the dildo/fleshlight. People want genital friction associated with attractive people and certain emotions relating to complex relationships. But now we can take pictures of attractive people and write stories that evoke the complex emotions, while using a dildo/fleshlight/hand to provide the genital friction, and that does substitute for sex pretty well. There’s still debate over whether porn makes people less likely to go out and form real relationships, but it’s at least plausibly another factor in the rich-country fertility decline. At the very least it doesn’t scream “well-thought-out alignment strategy robust to training-vs-deployment differences”.
But these are boring examples. These are like 2015 - level alignment concerns, from back when we thought the big problem was AIs seizing control of their reward centers or something. I think we might genuinely be able to avoid problems shaped like these. Unlike evolution, which had to work with lemurs, even weak GPT-level modern AIs are able to understand language and complicated concepts; we can tell them to want children instead of using genital friction as a proxy. 2023 alignment concerns are more about failed generalization - that is, about fetishes.
III.
Evolution’s alignment problem isn’t just that humans have learned to satiate their libido in ways other than procreative sex. It’s that some humans’ libidos are fundamentally confused. For example, some men, instead of wanting to have sex with women, mostly want to spank them, or be whipped by them, or kiss their feet, or dress up in their clothes. None of these things are going to result in babies! You can’t trivially blame this on the shift from training to deployment (ie the environment of evolutionary adaptedness to the modern world) - women had feet in the ancestral environment too. This is a different kind of failure.
Here’s a simple story of fetish formation: evolution gave us genes that somehow unfold into a “sex drive” in the brain. But the genome doesn’t inherently contain concepts like “man”, “woman”, “penis”, or “vagina”. I’m not trying to make a woke point here: the genome is just a bunch of the nucleotides A, T, C, and G in various patterns, but concepts like “man” and “woman” are learned during childhood as patterns of neural connections. We assume that the nucleotides are a program telling the body to do useful things, but that has to be implemented through deterministic pathways of proteins and the brain’s neural connections are too complex to trivially influence that way (see here for more). The genome probably contains some nucleotides that are supposed to refer to the concepts “man” and “woman” once the brain gets them, but there’s are a lot of fallible proteins in between those two levels.
So the simple story of fetish formation is that the genome contains some message written in nucleotides saying “have procreative sex with adults of the opposite sex as you”, some galaxy-brained Rube Goldberg plan for translating that message into neural connections during childhood or adolescence, and sometimes the plan fails. Here are some zero-evidence just-so-story speculations for how various fetishes might form, more to give you an idea what I’m talking about than because I claim to have useful knowledge on this topic:
- Foot fetish: On the somatosensory cortex, the area representing the feet is right next to the area representing the genitalia. If the genome includes an “address” for the genitalia, plus the instructions “have sexual urges towards this”, then getting the address slightly wrong will land you in the feet.
- Spanking: From the male point of view, penetrative PIV sex involves applying force to the bottom half of a woman, at rhythmic intervals, in a way that causes her very intense emotions and makes her make moan and scream. Spanking is exactly like this, and most kids encounter spanking at a very early age and sex only after they’re much older. If the evolutionary message is something like “find the concept that looks vaguely like this, then be into it”, spanking is the first concept like that most people will find; by the time they learn about actual sex, spanking might be a trapped prior.
by Scott Alexander, Astral Codex Ten | Read more:
Image: via
[ed. Never know what to expect at ACT.]