Duck Soup: Why Self-Taught Artificial Intelligence Has Trouble With the Real World

Until very recently, the machines that could trounce champions were at least respectful enough to start by learning from human experience.

To beat Garry Kasparov at chess in 1997, IBM engineers made use of centuries of chess wisdom in their Deep Blue computer. In 2016, Google DeepMind’s AlphaGo thrashed champion Lee Sedol at the ancient board game Go after poring over millions of positions from tens of thousands of human games.

But now artificial intelligence researchers are rethinking the way their bots incorporate the totality of human knowledge. The current trend is: Don’t bother.

Last October, the DeepMind team published details of a new Go-playing system, AlphaGo Zero, that studied no human games at all. Instead, it started with the game’s rules and played against itself. The first moves it made were completely random. After each game, it folded in new knowledge of what led to a win and what didn’t. At the end of these scrimmages, AlphaGo Zero went head to head with the already superhuman version of AlphaGo that had beaten Lee Sedol. It won 100 games to zero.

The team went on to create what would become another master gamer in the AlphaGo family, this one called simply AlphaZero. In a paper posted to the scientific preprint site arxiv.org in December, DeepMind researchers revealed that after starting again from scratch, the trained-up AlphaZero outperformed AlphaGo Zero — in other words, it beat the bot that beat the bot that beat the best Go players in the world. And when it was given the rules for chess or the Japanese chess variant shogi, AlphaZero quickly learned to defeat bespoke top-level algorithms for those games, too. Experts marveled at the program’s aggressive, unfamiliar style. “I always wondered how it would be if a superior species landed on Earth and showed us how they played chess,” Danish grandmaster Peter Heine Nielsen told a BBC interviewer. “Now I know.”

The past year also saw otherworldly self-taught bots emerge in settings as diverse as no-limit poker and Dota 2, a hugely popular multiplayer online video game in which fantasy-themed heroes battle for control of an alien world.

Of course, the companies investing money in these and similar systems have grander ambitions than just dominating video-game tournaments. Research teams like DeepMind hope to apply similar methods to real-world problems like building room-temperature superconductors, or understanding the origami needed to fold proteins into potent drug molecules. And of course, many practitioners hope to eventually build up to artificial general intelligence, an ill-defined but captivating goal in which a machine could think like a person, with the versatility to attack many different kinds of problems.

Yet despite the investments being made in these systems, it isn’t yet clear how far past the game board the current techniques can go. “I’m not sure the ideas in AlphaZero generalize readily,” said Pedro Domingos, a computer scientist at the University of Washington. “Games are a very, very unusual thing.”

Perfect Goals for an Imperfect World

One characteristic shared by many games, chess and Go included, is that players can see all the pieces on both sides at all times. Each player always has what’s termed “perfect information” about the state of the game. However devilishly complex the game gets, all you need to do is think forward from the current situation.

Plenty of real situations aren’t like that. Imagine asking a computer to diagnose an illness or conduct a business negotiation. “Most real-world strategic interactions involve hidden information,” said Noam Brown, a doctoral student in computer science at Carnegie Mellon University. “I feel like that’s been neglected by the majority of the AI community.”

Poker, which Brown specializes in, offers a different challenge. You can’t see your opponent’s cards. But here too, machines that learn by playing against themselves are now reaching superhuman levels. In January 2017, a program called Libratus created by Brown and his adviser, Tuomas Sandholm, outplayed four professional poker players at heads-up, no-limit Texas Hold’ em, finishing $1.7 million ahead of its competitors at the end of a 20-day competition.

An even more daunting game involving imperfect information is StarCraft II, another multiplayer online video game with a vast following. Players pick a team, build an army and wage war across a sci-fi landscape. But that landscape is shrouded in a fog of war that only lets players see areas where they have soldiers or buildings. Even the decision to scout your opponent is fraught with uncertainty.

This is one game that AI still can’t beat. Barriers to success include the sheer number of moves in a game, which often stretches into the thousands, and the speed at which they must be made. Every player — human or machine — has to worry about a vast set of possible futures with every click.

For now, going toe-to-toe with top humans in this arena is beyond the reach of AI. But it’s a target. In August 2017, DeepMind partnered with Blizzard Entertainment, the company that made StarCraft II, to release tools that they say will help open up the game to AI researchers.

Despite its challenges, StarCraft II comes down to a simply enunciated goal: Eradicate your enemy. That’s something it shares with chess, Go, poker, Dota 2 and just about every other game. In games, you can win.

From an algorithm’s perspective, problems need to have an “objective function,” a goal to be sought. When AlphaZero played chess, this wasn’t so hard. A loss counted as minus one, a draw was zero, and a win was plus one. AlphaZero’s objective function was to maximize its score. The objective function of a poker bot is just as simple: Win lots of money.

Real-life situations are not so straightforward. For example, a self-driving car needs a more nuanced objective function, something akin to the kind of careful phrasing you’d use to explain a wish to a genie. For example: Promptly deliver your passenger to the correct location, obeying all laws and appropriately weighing the value of human life in dangerous and uncertain situations. How researchers craft the objective function, Domingos said, “is one of the things that distinguishes a great machine-learning researcher from an average one.”

Consider Tay, a Twitter chatbot released by Microsoft on March 23, 2016. Tay’s objective was to engage people, and it did. “What unfortunately Tay discovered,” Domingos said, “is that the best way to maximize engagement is to spew out racist insults.” It was snatched back offline less than a day later.

by Joshua Sokol, Quanta | Read more:

Image: maxuser
[Re: Naked Capitalism: As Cathy O’Neil, author of Weapons of Math Destruction, pointed out, “Algorithms are just opinions expressed in math.”]

Sunday, February 25, 2018

Why Self-Taught Artificial Intelligence Has Trouble With the Real World