No limit: AI poker bot is first to beat professionals at multiplayer game

I'm one of the authors of the bot, AMA

n3k5 · 6 years ago

What took you so long? I mean not the Pluribus team specifically, but Poker AI researchers in general.

The desire to master this sort of game has inspired the development of entire branches of mathematics. Computers are better at maths than humans. They're less prone to hazardous cognitive biases (gambler's fallacy etc.) and can put on an excellent poker face.

As a layperson who's rather ignorant about both no-limit Texas hold 'em and applicable AI techniques, my intuition would tell me that super-human hold 'em should have been achieved before super-human Go. Apparently your software requires way less CPU power than AlphaGo/AlphaZero, which seems to support my hypothesis. What am I missing?

Bonus questions in case you have the time and inclination to oblige:

What does this mean for people who like to play on-line Poker for real money?

Could you recommend some literature (white papers/books/lecture series/whatever) to someone interested in writing an AI (running on potato-grade hardware) for a niche "draft and pass" card game (e.g. Sushi Go!) as a recreational programming exercise?

noambrown · 6 years ago

I think it took the community a while to come up with the right algorithms. So much of early AI research was focused on beating humans at chess and later Go. But those techniques don't directly carry over to an imperfect-information game like poker. The challenge of hidden information was kind of neglected by the AI community. This line of research really has its origins in the game theory community actually (which is why the notation is completely different from reinforcement learning).

Fortunately, these techniques now work really really well for poker. It's now quite inexpensive to make a superhuman poker bot.

icelancer · 6 years ago

>> Computers are better at maths than humans.

OP discussed it but while this is true, it is not necessarily true or straightforward when it comes to games with hidden information like poker. This is more of a game theoretical problem (Economics) than it is a purely mathematical one, which had less support in the AI/ML community, hence the delay.

The lower CPU/GPU/resource use supports that fact as does your intuition. Breaking poker required a lot of manual work and model design over brute force algorithms and reinforcement learning.

b_tterc_p · 6 years ago

The bot does not seem to consider previous hands in its decisions. That is to say, it does not consider who it is playing against. Should this affect how we perceive the bot as “strategic” or not? Bots that play purely mathematically optimally on expected value aren’t effective or interesting. But it feels like this is playing on just a much higher order expected value.

It feels like a more down to earth version of the sci fi super human running impossible differential equations to predict exactly what you will do given knowledge that he knows what you know what he knows... etc. ad Infinitum. But since it doesn’t actually consider the person it’s predicting, it may simply be a really really good approximation of the game theoretic dominant strategy.

At what complexity of game and hidden information should we feel like the bot can’t win by running a lookup table?

noambrown · 6 years ago

The bot bluffs, and understands that when its opponent bets it might be a bluff. I would consider that to be strategic behavior. The fact that its strategy is determined by a mathematical process doesn't change that in my opinion.

icelancer · 6 years ago

>> Bots that play purely mathematically optimally on expected value aren’t effective or interesting.

Interesting is up to you, but effective is definitely wrong.

ICM-perfect bots crush small tournaments, which do not take into account opponent behavior - merely modeling the gamestate. The faster the blinds and the smaller the stacks, the better, but even normal structures get killed by these so-called "expected value" only bots.

Game Theory Optimal (GTO) attacks are incredibly effective at all levels of the game. The AI need not incorporate opponent feedback to be a winner. It can make it better, but it is not at all required.

bluetwo · 6 years ago

First of all, I laughed at the 20-second average per game in self-play, since I ran into the same thing and have been trying to speed up the algorithm but haven't been able to get it faster (without throwing more hardware at it).

Second, I haven't read everything, but I believe you are playing a cash-game and not tournament-style. Is that correct? If that is the case, any chance you will be doing a tourney-style version?

[For those who don't play, in cash, a dollar is a dollar. In Tourney play, the top 2 or 3 players get paid out, so all dollars are not equal, as your strategy changes when you have only a few chips left (avoid risky bets that would knock you out) or when you are chip leader (take risky bets when they are cheap to push around your opponents).]

Also, curious how much poker you folks play in the lab for "research".

noambrown · 6 years ago

We're doing cash games in this experiment. At the end of the day, this is about advancing AI, not about making a poker bot. Going from two-player to multi-player has important implications for AI beyond just poker. I don't think the same is true for cash game vs tournament.

There's a cash game almost every night at the FBNY office! I don't usually play though -- I'm not nearly as good as the bot.

wallawe · 6 years ago

> In Tourney play, the top 2 or 3 players get paid out

Or top 2 or 3 thousand... depends on the tournament but it's usually the top 15% ish.

snarf21 · 6 years ago

How do you think these same pros would do in a follow-up match? As described in the article, the bot put players off their game with much more varied betting and with donks. Do you think the margin would decrease as players are exposed to these strategies?

Players face mental fatigue and have so over-learned their existing strategies that it takes time to adapt new strategies and even more time for those new strategies to become second-nature.

It reminds me of sports in a way. Teams start running a new wrinkle of offense in the NFL like the wildcat and it takes a few seasons for teams to instinctively know how to play defense correctly against that option.

noambrown · 6 years ago

In the paper we include a graph of performance over the course of the 10,000-hand 5 humans + 1 AI experiment that was played over 12 days. There's no indication that the bot's performance decreased over time (there is a temporary downward blip in the middle, but that's likely just variance). Based on discussions with pros, it sounds like they didn't find any weaknesses and they didn't seem to think they'd find any given more time.

MFLoon · 6 years ago

I also suspect it would not be able to maintain a ~40bb/100 hand win rate. The thing about human players is, while the best are capable of learning and employing truly balanced GTO strategies, in practice they rarely adhere to these because other humans (even good pros) will still have exploitable flaws in their strategies, and attempting to exploit these will be more profitable than sticking to the unexploitable strategy; of course it also opens the exploiter to counter-exploitation, creating a fluctuating cycle of players trying to exploit, getting exploited, then moving back towards playing unexploitably. That's the normal state of a pro's strategy in a given game - so to switch to a steady state of always playing unexploitably would be a fairly big adjustment even to top tier pros who are capable of it.

asdfman123 · 6 years ago

I remember reading in the mid-to-late aughts that a lot of old-school poker players that used more swagger and intuition were starting to be run out of the game by kids who applied statistical methods.

tc · 6 years ago

Could you perhaps speak to some of the engineering details that the paper glosses over. E.g.:

- Are the action and information abstraction procedures hand-engineered or learned in some manner?

- How does it decide how many bets to consider in a particular situation?

- Is there anything interesting going on with how the strategy is compressed in memory?

- How do you decide in the first betting round if a bet is far enough off-tree that online search is needed?

- When searching beyond leaf nodes, how did you choose how far to bias the strategies toward calling, raising, and folding?

- After it calculates how it would act with every possible hand, how does it use that to balance its strategy while taking into account the hand it is actually holding?

- In general, how much do these kind of engineering details and hyperparameters matter to your results and to the efficiency of training? How much time did you spend on this? Roughly how many lines of code are important for making this work?

- Why does this training method work so well on CPUs vs GPUs? Do you think there are any lessons here that might improve training efficiency for 2-player perfect-information systems such as AlphaZero?

noambrown · 6 years ago

We tried to make the paper as accessible as possible. A lot of these questions are covered in the supplementary material (along with pseudocode).

- Are the action and information abstraction procedures hand-engineered or learned in some manner?

- How does it decide how many bets to consider in a particular situation?

The information abstraction is determined by k-means clustering on certain features. There wasn't much thought put into the action abstraction because it turns out the exact sizes you use don't matter that much as long as the bot has enough options to choose from. We basically just did 0.25x pot, 0.5x pot, 1x pot, etc. The number of sizes varied depending on the situation.

- Is there anything interesting going on with how the strategy is compressed in memory?

Nope.

- How do you decide in the first betting round if a bet is far enough off-tree that online search is needed?

We set a threshold at $100.

- When searching beyond leaf nodes, how did you choose how far to bias the strategies toward calling, raising, and folding?

In each case, we multiplied by the biased action's probability by a factor of 5 and renormalized. In theory it doesn't really matter what the factor is.

- After it calculates how it would act with every possible hand, how does it use that to balance its strategy while taking into account the hand it is actually holding?

This comes out naturally from our use of Linear Counterfactual Regret Minimization in the search space. It's covered in more detail in the supplementary material

I think it's all pretty robust to the choice of parameters, but we didn't do extensive testing to see. While these bots are quite easy to train, the variance is so high in poker that getting meaningful experimental results is relatively quite computationally expensive.

- Why does this training method work so well on CPUs vs GPUs? Do you think there are any lessons here that might improve training efficiency for 2-player perfect-information systems such as AlphaZero?

I think the key is that the search algorithm is picking up so much of the slack that we don't really need to train an amazing precomputed strategy. If we weren't using search, it would probably be infeasible to generate a strong 6-player poker AI. Search was also critical for previous AI benchmark victories like chess and Go.

andr3w321 · 6 years ago

Any chance of the code being released or a cepheus style answer key being provided? http://poker.srv.ualberta.ca/strategy

noambrown · 6 years ago

I don't think the poker world would be happy with us if we did that. Heads-up limit hold'em isn't really played professionally anymore, but six-player no-limit hold'em is very popular.

isaacg · 6 years ago

In your Science paper, you mention playing 1H-5AI against 2 human players: Chris Ferguson and Darren Elias. In your blog post you also mention playing 1H-5AI against Linus Loelinger, who was within standard error of even money. Why did Linus not make it into the Science paper?

noambrown · 6 years ago

That took place after the final version of the Science paper was submitted. It would have been nice to include but it takes a while to do those experiments and we didn't feel it was worth delaying the publication process for it.

spenczar5 · 6 years ago

The article makes it sound like the AI is trained by evaluating results of decisions it makes on a per-hand basis. Is there any sense in which the AI learns about strategies that depend upon multiple hands? I’m thinking of bluffing/detecting bluffs and identifying recent patterns, which is something human poker players talk about.

noambrown · 6 years ago

The bot handles each hand independently. How the players play in one hand does not affect how the bot plays in future hands at all.

That said, it did train by playing against itself (before the experiment against the humans began).

Jach · 6 years ago

Was Judea Pearl's work relevant for the counterfactual regret minimization, or is there some other basis? I've added CR to the list of things to look into later but skimming the paper it was exciting to think advances are being made using causal theory...

noambrown · 6 years ago

The CFR algorithm is actually somewhat similar to Q-learning, but the connection is difficult to see because the algorithms came out of different communities, so the notation is all different.

throwamay1241 · 6 years ago

Who were the pros? Are they credible endbosses? Seth Davies works at RIO which deserves respect but I've never heard of the others except Chris Ferguson who I doubt is a very good player by todays standards (or human being, for that matter), but I've never heard of the others when I do know the likes of LLinusLove (iirc, the king of 6max), Polk and Phil Ganford.

Is 10,000 hands really considered a good enough sample? Most people consider 100k hands w/ a 4bb winrate to be an acceptable other math aside. However, as your opponent and yourself play with equal skill, variance increases to the point where regs refuse to sit each other.

noambrown · 6 years ago

LLinusLove was one of the players. Chris Ferguson was in one of the 5 AI's + 1 Human experiment but not the 5 Humans + 1 AI experiment.

We used AIVAT to reduce variance, which reduces the number of samples we need by roughly a factor of 10: https://poker.cs.ualberta.ca/publications/aaai18-burch-aivat...

icelancer · 6 years ago

What? The pros chosen were definitely highly skilled players. They're fairly well known in the online poker community.

Furthermore, Chris Ferguson, scumbag aside, is absolutely still a very good player by today's standards, and one way higher than the mean participant in a research experiment.

10,000 hands is an effective enough sample at a certain win rate and analysis of variance of play; the n-value alone is not enough to tell you if it was enough hands.

splonk · 6 years ago

They're credible enough. I'd like the sample sizes to be bigger as well but they're enough to verify that even if the bot got lucky over the sample size, it's close enough that it doesn't really matter. Add a bit more compute, optimize some algorithms a little, and you'd make up the difference. The real point is that they have a technique that scales to 6-max, and whether it's 97% or 99% is kind of immaterial in the grand scheme of things.

FWIW, they did some variance reduction techniques that dramatically reduce the number of hands needed to be confident in your results, so the number of hands may be bigger than you think. e.g. the results of 10k HU hands have much higher variance than the results of 10k HU hands where everyone just collects their EV once they're all in.

ayemeng · 6 years ago

Jimmy Chou, Jason Les, Dong Kim are affiliated with Doug Polk.

It is an interesting point that these are pros but their specialities are either tournament or heads up. The current 6 max pros are LLinusLove, Otb_RedBaron, TrueTeller.

Deleted Comment

kapurs151 · 6 years ago

I'm very late to this post, so not sure if you're still around.

What are your thoughts on a poker tournament for bots? Do you think it could turn into a successful product? I've always wanted to build an online poker/chess game that was designed from the ground up for bots (everything being accessible through an API), but have always worried that someone with more computational resources or the best bot would win consistently. Is it an idea you've thought about?

tasubotadas · 6 years ago

Congrants on the bot!

I have a few basic questions. I would like to implement my own generic game bot (discrete states). Are there any universal approaches? Is MCMC sampling good enough to start? My initial idea was to do importance sampling on some utility/score function.

Also, I am looking into poker game solvers - what would be a good place to start? What's the simplest algorithm?

Thanks

haburka · 6 years ago

Why did you optimize for using less cpus? Was it a happy accident or a goal?

noambrown · 6 years ago

A little bit of both. We didn't think we needed the extra computing power. And we really wanted to convey how cheap it is to make a superstrong poker AI with these latest algorithms.

waynecochran · 6 years ago

Knowing when to bluff often depends on the psychology of the opponent, but since it trained playing itself it doesn't seem that knowing when to bluff would be learned. Did it bluff very often?

noambrown · 6 years ago

The bot does bluff, and in fact it learns from self-play that bluffing is (sometimes) the optimal thing to do. At the end of the day, bluffing is simply betting when you have a weak hand. The bot learns from experience that when it bets with a weak hand, the opponent (another copy of itself) sometimes folds and it makes more money than if it hadn't bet. The bot doesn't view it as deceptive or dishonest. It just views it as the action that makes it the most money.

Of course, a key part of bluffing is getting the probabilities right. You can't always bluff and you can't never bluff, because that would make you too predictable. But our self-play and search algorithms are designed to get those probabilities right.

femto113 · 6 years ago

At the highest levels of play psychological factors are pretty minimal. Before a showdown which cards you actually hold aren't particularly material, as the only information you convey is through your bids. This means if you predict that you're more likely to win a hand by bidding (and inducing a fold) than by calling and going to a showdown it makes mathematical sense to "bluff". I'm sure AIs have no trouble learning that fact.

samfriedman · 6 years ago

Are there any ethical considerations relating to the prospect of use of this bot for cheating in real-money games? Either from your internal team or after public replication?

noambrown · 6 years ago

We're really focused on advancing the fundamental AI aspect. We're not here to kill poker. The popular poker sites have quite sophisticated anti-bot measures, but it's true that this is an arms race.

mensetmanusman · 6 years ago

There are no ethical reasons why a game like poker must exist. In fact, poker gives a false sense of hope to the thousands of gambling addicts that enter casinos. It is a fun game, but there are an unlimited potential number of fun games..

pogopop77 · 6 years ago

Very impressive. If my understanding of how the AI works is correct, it is using a pre-computed strategy developed by playing trillions of hands, but it is not dynamically updating that during game play, nor building any kind of profiles of opponents. I wonder if by playing against it many times, human opponents could discern any tendencies they could exploit. Especially if the pre-computed strategy remains static.

noambrown · 6 years ago

We played 10,000 hands of poker over the course of 12 days in the 5 humans + 1 AI experiment, and 5,000 hands per player in the 1 human + 5 AI's experiment. That's a good amount of time for a player to find a weakness in the system. There's no indication that any of the players found any weaknesses.

In fact, the methods we use are designed from the ground up to minimize exploitability. That's a really important property to have for an AI system that is actually deployed in the real world.

darse · 6 years ago

A hearty congratulations, Noam, on finishing another chapter of the story i opened in the early 1990s...

Another person asked "What took you so long?", and i had the same question. :) I really thought this milestone would be achieved fairly soon after i left the field in 2007. However, breakthroughs require a researcher with the right amount of reflectiveness, insight, and determination.

Well done.

hoerzu · 6 years ago

The progress you have made in this research field is amazing. What do you think will be next step or where do you the the future of your research?

noambrown · 6 years ago

Thanks! I think going beyond two-player/team zero-sum games is really important. This was a first step, but it's definitely not the last. I'm hoping to continue in this direction, and maybe start looking at interactions involving the potential for cooperation in addition to competition.

splonk · 6 years ago

I haven't finished digging through the paper and the supplement yet, but I'm curious about how many hands were multiway to the flop (and whether the percentages differ significantly between 1H/5AI and 5H/1AI). I'd guess that it's a pretty small fraction of the total hands, and I'm wondering what the performance is like in those particular cases.

noambrown · 6 years ago

I don't have the exact percentages but I think it's less than 10%. It's not really possible to measure the bot's performance just in specific situations, but my feeling is the bot performs relatively well in these situations. Multi-way flops were basically impossible to do in a reasonable amount of time for past AI's. Our new search techniques make these situations feasible to figure out in seconds.

clavalle · 6 years ago

What table information does the bot take into account? Position? Other player's stack size?

>Regardless of which hand Pluribus is actually holding, it will first calculate how it would act with every possible hand .

Is this information used to form an idea of what other players might be holding based on how the other player acts and how closely that action matches Pluribus's 'what if' action?

throwamay1241 · 6 years ago

No, it's to mask actions. If you bet big with monsters and check with air 100% of the time, you opponent knows when to fold and bet.

iirc, the frequency of bets in that spot is roughly equivalent to the frequency of times you're definitely in front of your opponent in that particular spot, but not always with the hands that are beating your opponent.

The concept is called Game Theory Optimal (GTO) and it's pretty popular in higher stakes games.

eries · 6 years ago

Can you share some about what strategies the bot prefers and how these compare with common professional human strategies?

noambrown · 6 years ago

We talk about this a bit in the paper. Based on the feedback from the pros, the bot seems to "donk bet" (call and then bet on the next round) much more than human pros do. It also randomizes between multiple bet sizes, including very large bet sizes, while humans stick to just one or two sizes depending on the situation.

ewhauser421 · 6 years ago

Neal - super interesting stuff. Couple of questions:

1) What were the reasons for choosing 6-handed play (assuming logistical and costs)? It would be interesting to see how the bot’s strategy would differ in a full ring game. 2) Are there any plans to commercialize the bot as a tool for training human players?

noambrown · 6 years ago

1) The goal was to show convincingly that we could handle multi-player poker. The exact number of players was kind of arbitrary. We chose six-player because that's the most common/popular format. Considering training the 6-player bot would cost less than $150 on a cloud computing service, I think it's safe to say these techniques would all work fine in other formats.

2) I'm quite happy working on fundamental AI research and plan to continue in that direction.

zone411 · 6 years ago

6-handed is a very common format online.

DennisP · 6 years ago

Are any papers available yet?

Is the bot going for game-theory-optimal play, or trying to exploit weaknesses in other players?

noambrown · 6 years ago

The paper is here: https://science.sciencemag.org/content/early/2019/07/10/scie...

It's going for game-theory-optimal play. It doesn't adapt to its opponents' observed weaknesses. But I think it's cool to show that you don't need to adapt to opponent weaknesses to win at poker at the highest levels. You just need to not have any weaknesses yourself.

confidantlake · 6 years ago

Why would you choose Chris Ferguson to participate? Don't you know his terrible history?

Dead Comment

umanwizard · 6 years ago

Congrats! As soon as I saw the title I thought “I wonder if this is the project Noam works on...”

noambrown · 6 years ago

Thanks!

meuk · 6 years ago

Congratulations on the win! Can you recommend any papers, blog(post)s, or books for the interested layman? (I am currently scanning though the facebook post, which is great, but personally I am looking for something more technical).

doctorpangloss · 6 years ago

Do you want to do a Hearthstone / CCG bot? I have an engine and testers for you.

vagab0nd · 6 years ago

Very interesting results. From the paper it sounds like the algorithms you used are very similar to Libratus (pre-solved blueprint + subgame solving). What change made it so that the computation requirement is much lower now?

noambrown · 6 years ago

There were several improvements but the most important was the depth-limited search. Libratus would always search to the end of the game. But that's not necessarily feasible in a game as complex as six-player poker. With these new algorithms, we don't need to go to the end of the game. Instead, we can stop at some arbitrary depth limit (as is done in chess and Go AI's). That drastically reduces the amount of compute needed.

RivieraKid · 6 years ago

Can you share more details about the abstraction? The paper is kind of vague on it. How does it decide if it should use 1 or 14 bet values? Is it a perfect recall abstraction? How many information sets are there?

noambrown · 6 years ago

We give more details on this in the supplementary material.

baq · 6 years ago

When do you solve bridge? :)

canistel · 6 years ago

It is in a way disappointing that this question gets so little attention, and yet, it might be the most significant. If a bot can false-card - if it can discern the strategy that the opponents have in mind, and deliberately mislead them to its own advantage - we have a real world AI. However, skills of computer bridge programs remain at club level standards.

ryandrake · 6 years ago

Interesting that the conventional wisdom of never open limping emerged as confirmed through self-play. What other general poker “best practices” were either confirmed or upended through this research?

yalogin · 6 years ago

For someone not in the AI field, can you explain why AI is needed and an elaborate code with conditional blocks is not enough? Where does AI fit in with a poker game.

b_tterc_p · 6 years ago

Conditional blocks would work, but it would be an impossibly detailed and granular tree to setup. The AI component simply helps you arrive at the decisions to create the complex tree.

smallgovt · 6 years ago

This is super interesting! What steps would you recommend a professional poker player take in order to use AI to improve his/her personal poker skills?

o_p · 6 years ago

Does it beat poker by reaching Nash eq (where you cant make profit and no one can profit from you) or exploits opponents weakness to seek profit ?

noambrown · 6 years ago

It doesn't exploit its opponents' weaknesses. Its focus was on not having any weaknesses that its opponents could exploit. However, the algorithms are not guaranteed to converge to a Nash equilibrium in this setting because it's not a two-player zero-sum game (and in either case, it's not clear that playing a Nash equilibrium would provide much benefit in this setting).

mfwebser · 6 years ago

What sort of defense applications could this sort of technology be used for? The last line of the Facebook blog post sparked curiosity.

estomagordo · 6 years ago

Do you expect the human players to play at the best of their ability when they're not playing for actual real money?

noambrown · 6 years ago

There was real money at stake in this experiment. The pros were guaranteed $0.40 per hand just for participating, but that could increase to $1.60 per hand depending on how well they did.

To answer your question, no, I don't think human players would play at their best when not playing for actual money.

grizzles · 6 years ago

Any chance you could put Libratus / Pluribus online for people like me to try to beat it?

noambrown · 6 years ago

Unfortunately we don't have any plans to do that currently.

codefiddler · 6 years ago

Are all the hands posted online somewhere for analysis. I would be very interested!

abstract7 · 6 years ago

How many games did the bot beat the same 5 players? And how many games were played?

noambrown · 6 years ago

We played 10,000 hands of poker in the 5 humans + 1 AI experiment. The number of hands won isn't a useful metric in poker. If you win only 10% of your hands and make $1,000 on those hands, while losing only $1 on the other 90% of hands, then you're a winning player. The bot won at a rate of 4.8 bb/100 ($4.8 per hand if the blinds are $50/$100). This is considered a large win rate by professionals.

logical42 · 6 years ago

Any chance you’ll consider releasing the hand history of the session?

skater · 6 years ago

they're in the extra data section of the science mag article. formatting is terrible for importing into hand history viewers, so i'm trying to get a friend to re-format

hmate9 · 6 years ago

What was the most challenging part about implementing this?

noambrown · 6 years ago

Honestly, probably debugging. Training this thing is very cheap, but the variance in poker is huge (even with the best variance-reduction techniques) so it takes a very long time to tell whether one version is better than another version (or better than a human).

dillonmckay · 6 years ago

When will you test it with 10 total players in a game?

noambrown · 6 years ago

The number of players is kind of arbitrary given the techniques we're using. We chose 6 because that's the most popular/common format for poker. I don't think there's any scientific value in also doing 10.

ikeboy · 6 years ago

Any plans to make money using this in online games?

noambrown · 6 years ago

No, I don't have any plans to do that. This is really about advancing fundamental AI research.

cambaceres · 6 years ago

What are the names of the poker pros the AI beat?

cambaceres · 6 years ago

Are all the hands available to the public?

noambrown · 6 years ago

The hand logs from the 5 humans + 1 AI experiment are included in the supplementary material of the Science paper.

w_s_l · 6 years ago

will you release the source code?

noambrown · 6 years ago

Our goal is to make the research as accessible as possible to the AI community, so we include descriptions of the algorithms and pseudocode in the supplementary material. However, in part due to the potential negative impact this code could have on online poker, we're not releasing the code itself.

anbop · 6 years ago

Is this publicly available? How can I use it?

Dead Comment

ProAm · 6 years ago

What's the name of the bot? Please say its Poker McPokerface

rotred · 6 years ago

This is literally in the second sentence of the article

>A superhuman poker-playing bot called Pluribus has beaten top human professionals at six-player no-limit Texas hold’em poker...

With the advent of AI bots in Poker, Chess etc., what happens to the old adage of "Play the player, not the game". How do modern human players manage when you don't have the psychological aspects of the game to work with?

I see on chess channels that grand masters have to rethink their whole game preparation methodology to cope with the "Alpha Zero" oddities that have now been introduced into this ancient game. They literally have to "throw out the book" of standard openings and middle games and start afresh.

pk2200 · 6 years ago

The chess channels you're visiting are grossly overstating Alpha Zero's impact. AFAICT, it hasn't made any impact on opening theory at all. AZ's strength is in the middlegame, where it appears to be slightly better than traditional engines (like Stockfish) at finding material sacrifices for long term piece activity and/or mating attacks.

friedman23 · 6 years ago

> what happens to the old adage of "Play the player, not the game". How do modern human players manage when you don't have the psychological aspects of the game to work with?

I would say that it's thoroughly rebounded to play the game not the player in poker and this isn't because of super bots like the one used in this paper.

Ever since game theory invaded poker players that play in highly visible events such as tv tournaments try as hard as possible to make their game unexploitable.

yelloworld · 6 years ago

Like already stated, saying that Alpha Zero has forced the chess world to seriously reconsider the basic principles of chess openings etc. is a bit of a stretch. But interestingly enough, the current world champion (Magnus Carlsen) is having the chess streak of his life as we speak. On the side, he's been openly joking about Alpha Zero being one of his biggest chess idols. It's safe to say the streak is probably mostly related to his preparation from the last world championship match half a year ago carrying over to all the tournaments after.

However, even according to the former world champion (Viswanathan Anand) the run he's been on is something quite shocking: “His results this year is simply [great].... difficult to find words. [It’s been] completely off the charts. I think the chess world is still in a bit of a shock. The rest of the players are struggling to deal with a phenomenon [like him]. Even in 2012-13, his domination was less than it is this year. Everyone is still processing this information.” [1]

Carlsen is basically on route to breaking 2900 Elo - at 2882 Elo with a clear upwards trend - while there's only two other active players even above 2800 Elo and struggling to keep it above that treshold. (Elo is the rating system used in chess. Above 1500 Elo is an average player, 2000 Elo is a good player, 2500 Elo is a grandmaster. Anything above 2700 Elo is basically godlike.)

Oddly enough, instead of playing more like a machine, it seems like Carlsen has been playing chess that is much more about the human aspect of the game rather than trying to find the top ranked engine move on every turn. (The current traditional top engine - Stockfish - makes an assumption of each move's validity using a point system, which the chess world has been more or less obsessing over for the past decade. Alpha Zero doesn't have such a point system whatsoever.) He's been playing a drastically more aggressive and dynamic variety of chess compared to what has been seen in a long time at the top tournaments.

He's been playing to create dizzying positions on the board, making a few moves that aren't necessarily liked by the traditional top engines, but still finding himself in a winning position several moves after. It definitely looks like some sort of black magic, but it seems like the big thing Alpha Zero has brought to the general philosophy on how to approach chess at the top level is that it's possible to play aggressive chess, take risks and win in 2019. Magnus Carlsen is the first player to successfully reinvent that style of play, more than likely partly inspired by Alpha Zero. So, I'd say the big thing about Alpha Zero isn't necessarily that it could beat the other top engines, but more importantly that the 'artistic' aspect of its play is something that has never been seen from another chess engine. The fact that it proved that sort of style superior to the play ever before played by another chess engine is just the icing on the cake.

Garry Kasparov on Alpha Zero's chess persona: "I admit that I was pleased to see that AlphaZero had a dynamic, open style like my own. The conventional wisdom was that machines would approach perfection with endless dry maneuvering, usually leading to drawn games. But in my observation, AlphaZero prioritizes piece activity over material, preferring positions that to my eye looked risky and aggressive." [2]

[1] https://sportstar.thehindu.com/chess/viswanathan-anand-on-ma... [2] https://science.sciencemag.org/content/362/6419/1087