Readit News logoReadit News
eclark commented on Poker Tournament for LLMs   pokerbattle.ai/event... · Posted by u/SweetSoftPillow
eru · 4 months ago
Why would they need to lie? Where's the lying in Poker?

(Ignore for a moment that LLMs can lie just fine.)

What you are describing is exploring a range of counterfactuals. That's not lying.

eclark · 4 months ago
Early game bluffs are essentially lies that you tell through the rest of the streets. In order to keep your opponents from knowing when you have premium starting hands, it's required to play some ranges, sometimes as if they were a different range. E.g., 10% of the time, I will bluff and act like I have AK, KK, AA, QQ. On the next street, I will need to continue that; otherwise, it becomes not profitable (opponents only need to wait one bet to know if I am bluffing). I have to evolve the lie as well. If cards come out that make my story more or less likely/profitable/possible, then I need to adjust the lie, not revert to the truth or the opponent's truth.

To see that LLMs aren't capable of this, I present all of the prompt jailbreaks that rely on repeated admonitions. And that makes sense if you think about the training data. There's not a lot of human writing that takes a fact and then confidently asserts the opposite as data mounts.

LLMs produce the most likely response from the input embeddings. Almost always, the easiest is that the next token is in agreement of the other tokens in the sequence. The problem in poker is that a good amount of the tokens in the sequence are masked and/or controlled by a villain who is actively trying to deceive.

Also, notice that I'm careful to say LLM's and not generalize to all attention head + MLP models. As attention with softmax and dot product is a good universal function. Instead, it's the large language model part that makes the models not great fits for poker. Human text doesn't have a latent space that's written about enough and thoroughly enough to have poker solved in there.

eclark commented on Poker Tournament for LLMs   pokerbattle.ai/event... · Posted by u/SweetSoftPillow
jwatte · 4 months ago
I don't think this analysis matches the underlying implementation.

The width of the models is typically wide enough to "explore" many possible actions, score them, and let the sampler pick the next action based on the weights. (Whether a given trained parameter set will be any good at it, is a different question.)

The number of attention heads for the context is similarly quite high.

And, as a matter of mechanics, the core neuron formulation (dot product input and a non-linearity) excels at working with ranges.

eclark · 4 months ago
No the widths are not wide enough to explore. The number of possible game states can explode beyond the number of atoms in the universe pretty easily, especially if you use deep stacks with small big blinds.

For example when computing the counterfactual tree for 9 way preflop. 9 players have up to 6 different times that they can be asked to perform an action (seat 0 can bet 1, seat 1 raises min, seat 2 calls, back to seat 0 raises min, with seat 1 calling, and seat 2 raising min, etc). Each of those actions has check, fold, bet min, raise the min (starting blinds of 100 are pretty high all ready), raise one more than the min, raise two more than the min, ... raise all in (with up to a million chips).

(1,000,000.00 - 999,900.00) ^ 6 times per round ^ 9 players That's just for pre flop. Postflop, River, Turn, Showdown. Now imagine that we have to simulate which cards they have and which order they come in the streets (that greatly changes the value of the pot).

As for LLMs being great at range stats, I would point you to the latest research by UChicago. Text trained LLMs are horrible at multiplication. Try getting any of them to multiply any non-regular number by e or pi. https://computerscience.uchicago.edu/news/why-cant-powerful-...

Don't get what I'm saying wrong though. Masked attention and sequence-based context models are going to be critical to machines solving hidden information problems like this. Large Language Models trained on the web crawl and the stack with text input will not be those models though.

eclark commented on Poker Tournament for LLMs   pokerbattle.ai/event... · Posted by u/SweetSoftPillow
Tostino · 4 months ago
Why wouldn't something like an RL environment allow them to specialize in poker playing, gaining those skills as necessary to increase score in that environment?

E.g. given a small code execution environment, it could use some secure random generator to pick between options, it could use a calculator for whatever math it decides it can't do 'mentally', and they are very capable of deception already, even more so when the RL training target encourages it.

I'm not sure why you couldn't train an LLM to play poker quite well with a relatively simple training harness.

eclark · 4 months ago
> Why wouldn't something like an RL environment allow them to specialize in poker playing, gaining those skills as necessary to increase score in that environment?

I think an RL environment is needed to solve poker with an ML model. I also think that like chess, you need the model to do some approximate work. General-purpose LLMs trained on text corpus are bad at math, bad at accuracy, and struggle to stay on task while exploring.

So a purpose built model with a purpose built exploring harness is likely needed. I've built the basis of an RL like environment, and the basis of learning agents in rust for poker. Next steps to come.

eclark commented on Poker Tournament for LLMs   pokerbattle.ai/event... · Posted by u/SweetSoftPillow
brrrrrm · 4 months ago
> None of which are they currently capable

what makes you say this? modern LLMs (the top players in this leaderboard) are typically equipped with the ability to execute arbitrary Python and regularly do math + random generations.

I agree it's not an efficient mechanism by any means, but I think a fine-tuned LLM could play near GTO for almost all hands in a small ring setting

eclark · 4 months ago
To play GTO currently you need to play hand ranges. (For example when looking at a hand I would think: I could have AKs-ATs, QQ-99, and she/he could have JT-98s, 99-44, so my next move will act like I have strength and they don't because the board doesn't contain any low cards). We have do this since you can't always bet 4x pot when you have aces, the opponents will always know your hand strength directly.

LLM's aren't capable of this deception. They can't be told that they have some thing, pretend like they have something else, and then revert to gound truth. Their egar nature with large context leads to them getting confused.

On top of that there's a lot of precise math. In no limit the bets are not capped, so you can bet 9.2 big blinds in a spot. That could be profitable because your opponents will call and lose (eg the players willing to pay that sometimes have hands that you can beat). However betting 9.8 big blinds might be enough to scare off the good hands. So there's a lot of probiblity math with multiplication.

Deep math with multiplication and accuracy are not the forte of llm's.

eclark commented on Poker Tournament for LLMs   pokerbattle.ai/event... · Posted by u/SweetSoftPillow
Cool_Caribou · 4 months ago
Is limit poker a trivial game? I believe it's been solved for a long time already.
eclark · 4 months ago
No it's far from trivial for three reasons.

First being the hidden information, you don't know your opponents hand holdings; that is to say everyone in the game has a different information set.

The second is that there's a variable number of players in the game at any time. Heads up games are closer to solved. Mid ring games have had some decent attempts made. Full ring with 9 players is hard, and academic papers on it are sparse.

The third is the potential number of actions. For no limit games there's a lot of potential actions, as you can bet in small decimal increments of a big blind. Betting 4.4 big blinds could be correct and profitable, while betting 4.9 big blinds could be losing, so there's a lot to explore.

eclark commented on Poker Tournament for LLMs   pokerbattle.ai/event... · Posted by u/SweetSoftPillow
RivieraKid · 4 months ago
What are you working on specifically? I've been vaguely following poker research since Libratus, the last paper I've read is ReBeL, has there been any meaningful progress after that?

I was thinking about developing a 5-max poker agent that can play decently (not superhumanly), but it still seems like a kind of uncharted territory, there's Pluribus but limited to fixed stacks, very complex and very computationally demanding to train and I think also during gameplay.

I don't see why a LLM can't learn to play a mixed strategy. A LLM outputs a distribution over all tokens, which is then randomly sampled from.

eclark · 4 months ago
Text trained LLM's are likely not a good solution for optimal play, just as in chess the position changes too much, there's too much exploration, and too much accuracy needed.

CFR is still the best, however, like chess, we need a network that can help evaluate the position. Unlike chess, the hard part isn't knowing a value; it's knowing what the current game position is. For that, we need something unique.

I'm pretty convinced that this is solvable. I've been working on rs-poker for quite a while. Right now we have a whole multi-handed arena implemented, and a multi-threaded counterfactual framework (multi-threaded, with no memory fragmentation, and good cache coherency)

With BERT and some clever sequence encoding we can create a powerful agent. If anyone is interested, my email is: elliott.neil.clark@gmail.com

eclark commented on Poker Tournament for LLMs   pokerbattle.ai/event... · Posted by u/SweetSoftPillow
_ink_ · 4 months ago
> LLMs do not have a mechanism for sampling from given probability distributions.

They could have a tool for that, tho.

eclark · 4 months ago
They would need to lie, which they can't currently do. To play at our current best, our approximation of optimal play involves ranges. Thinking about your hand as being any one of a number of cards. Then imagine that you have combinations of those hands, and decide what you would do. That process of exploration by imagination doesn't work with an eager LLM using huge encoded context.
eclark commented on Poker Tournament for LLMs   pokerbattle.ai/event... · Posted by u/SweetSoftPillow
eclark · 4 months ago
I am the author/maintainer of rs-poker ( https://github.com/elliottneilclark/rs-poker ). I've been working on algorithmic poker for quite a while. This isn't the way to do it. LLMs would need to be able to do math, lie, and be random. None of which are they currently capable.

We know how to compute the best moves in poker (it's computationally challenging; the more choices and players are present, the more likely it is that most attempts only even try at heads-up).

With all that said, I do think there's a way to use attention and BERT to solve poker (when trained on non-text sequences). We need a better corpus of games and some training time on unique models. If anyone is interested, my email is elliott.neil.clark @ gmail.com

eclark commented on Flightcontrol: A PaaS that deploys to your AWS account   flightcontrol.dev/... · Posted by u/handfuloflight
hshdhdhehd · 5 months ago
Nice idea. I work at SP500 corp and we have platform teams provide this. I always wondered if it could be a startup. My reservation with tbis is tbe classic one: not open source. But I totally get why it isn't. Although FOSS might work as the value (at work and here too) is in having someone in a Slack channel that can help if a deploy gets stuck. The code is kind of a terraform-esque of sorts.
eclark · 5 months ago
I think 'Batteries Included' would interest you, then. Like this, it's installable on AWS. It's a whole platform PaaS + AI + more built on open source. So Kubernetes is at the core, but with tons of automation and UI. Dev environments are Kubernetes in Docker (Kind-based).

- https://github.com/batteries-included/batteries-included/ - https://www.batteriesincl.com/

eclark commented on DJ With Apple Music launches to enable subscribers to mix their own sets   musicweek.com/digital/rea... · Posted by u/CharlesW
kcrwfrd_ · a year ago
Is there support for Rekordbox?

It would be awesome so that you can use Apple Music alongside normal USB sticks / Rekordbox libraries on the Pioneer XDJ line of equipment.

eclark · a year ago
> The feature is integrated with DJ software and hardware platforms AlphaTheta

They called out AlphaTheta, so here's hoping that it is. That would make my decision to move off of Spotify for personal streaming even easier

u/eclark

KarmaCake day254March 10, 2010
About
elliott @ batteriesincl . com

https://www.batteriesincl.com/

View Original