ENGLISH

Machine learning goes beyond theory to beat human poker champs

185

Quality Data: The first mile of machine learning
To launch an effective machine learning initiative, companies need to start with quality data — and maintain steady flow of data to keep models updated, explains Dale Brown, Head of Operations at Figure Eight.

Among the many achievements of machine learning in recent years, some of the most striking are the victories of the machine against human players in games, such as Google’s DeepMind group’s conquest of Go in 2016. In such milestones, researchers are often guided by theoretical math that says there can be an optimal strategy to be found, given a good algorithm and enough compute.

But what do you do when theory breaks down? Two researchers at Carnegie Mellon University and Facebook went back to the drawing board to solve “heads-up no-limit Texas hold’em,” the most popular form of multiplayer poker in the world.

Theory isn’t computable for this form of the card game, so they designed some elegant search strategies for their computer program, “Pluribus,” to beat the best human players in 10,000 hands of poker. The authors even managed to do it with a single, 64-core Intel-based server, with just 512 gigabytes of RAM, which they point out is far less compute than increasingly gigantic machine learning models such as DeepMind’s “AlphaZero” that use tons of computing to solve things.

Rather than computing optimal solutions across players, the Pluribus program searches for good enough solutions that turn out to perform surprisingly well.

Primers: What is AI? | What is machine learning? | What is deep learning? | What is artificial general intelligence?

The work, “Superhuman AI for multiplayer poker,” describing competition over twelve days against top world players at poker, is published today in Science magazine and is written by Noam Brown and Tuomas Sandholm. Brown and Sandholm both have affiliations with Carnegie Mellon University; Brown is also with Facebook AI Research, and Sandholm has affiliations with three Pittsburgh companies, Strategic Machine, Inc., Strategy Robot, Inc., and Optimized Markets, Inc.

Science magazine has become something of a hotbed for cutting-edge poker papers by machine learning types, and this is the second appearance by Brown and Sandholm in a little over a year. In January of last year, they published a machine learning model called “Libratus” that could achieve “superhuman” ability in two-player versions of Texas hold’em poker.

brown-and-standholm-2019-pluribus-poker-search-strategy.png

With Pluribus, the authors take on a new level of complexity that comes with multiple opponents; in this case, five humans against the Pluribus machine. In most games taken on by machine learning, including Go and two-player poker, there is a theoretical framework that forms the basis for finding optimal playing strategies. The “Nash Equilibrium,” named for famed US mathematician John Nash, says that optimal playing strategies can be found for each player based on the assumption every opponent in a game is equally playing their best strategy.

In a simple game like rocks, paper, scissors, just playing the same choice every round, such as rocks, can be the optimal strategy leading to equilibrium between players.

So making bots that play games can in some sense be boiled down to building a machine that computes the Nash Equilibrium.

The problem is, as games increase in complexity, finding the Nash Equilibrium becomes more and more computationally intense. Approximating that equilibrium is the best computers can do within practical time limits. It’s worked well for a number of approaches, and, in particular, in two-player heads up poker, it was an approach that served Brown and Sandholm well with Libratus, as it did another team, Moravčik and colleagues at the University of Alberta, who published their “DeepStack” machine for Texas hold’em in Science in 2017.

Related Topics: