Readit News logoReadit News
andreyk commented on Sycophancy is the first LLM "dark pattern"   seangoedecke.com/ai-sycop... · Posted by u/jxmorris12
vladsh · 17 days ago
LLMs get over-analyzed. They’re predictive text models trained to match patterns in their data, statistical algorithms, not brains, not systems with “psychology” in any human sense.

Agents, however, are products. They should have clear UX boundaries: show what context they’re using, communicate uncertainty, validate outputs where possible, and expose performance so users can understand when and why they fail.

IMO the real issue is that raw, general-purpose models were released directly to consumers. That normalized under-specified consumer products, created the expectation that users would interpret model behavior, define their own success criteria, and manually handle edge cases, sometimes with severe real world consequences.

I’m sure the market will fix itself with time, but I hope more people would know when not to use these half baked AGI “products”

andreyk · 17 days ago
To say they LLMs are 'predictive text models trained to match patterns in their data, statistical algorithms, not brains, not systems with “psychology” in any human sense.' is not entirely accurate. Classic LLMs like GPT 3 , sure. But LLM-powered chatbots (ChatGPT, Claude - which is what this article is really about) go through much more than just predict-next-token training (RLHF, presumably now reasoning training, who knows what else).
andreyk commented on A new AI winter is coming?   taranis.ie/llms-are-a-fai... · Posted by u/voxleone
andreyk · 17 days ago
This blog post is full of bizarre statements and the author seems almost entirely ignorant of the history or present of AI. I think it's fair to argue there may be an AI bubble that will burst, but this blog post is plainly wrong in many ways.

Here's a few clarifications (sorry this is so long...):

"I should explain for anyone who hasn't heard that term [AI winter]... there was much hope, as there is now, but ultimately the technology stagnated. "

The term AI winter typically refers to a period of reduced funding for AI research/development, not the technology stagnating (the technology failing to deliver on expectations was the cause of the AI winter, not the definition of AI winter).

"[When GPT3 came out, pre-ChatGPT] People were saying that this meant that the AI winter was over, and a new era was beginning."

People tend to agree there were two AI winters already, one having to do with symbolic AI disappointments/general lack of progress (70s), and the latter related to expert systems (late 80s). That AI winter has long been over. The Deep Learning revolution started in ~2012, and by 2020 (GPT 3) huge amount of talent and money were already going into AI for years. This trend just accelerated with ChatGPT.

"[After symbolic AI] So then came transformers. Seemingly capable of true AI, or, at least, scaling to being good enough to be called true AI, with astonishing capabilities ... the huge research breakthrough was figuring out that, by starting with essentially random coefficients (weights and biases) in the linear algebra, and during training back-propagating errors, these weights and biases could eventually converge on something that worked."

Transformers came about in 2017. The first wave of excitement about neural nets and backpropagation goes all the way back to the late 80s/early 90s, and AI (computer vision, NLP, to a lesser extent robotics) were already heavily ML-based by the 2000s, just not neural-net based (this changed in roughly 2012).

"All transformers have a fundamental limitation, which can not be eliminated by scaling to larger models, more training data or better fine-tuning ... This is the root of the hallucination problem in transformers, and is unsolveable because hallucinating is all that transformers can do."

The 'highest number' token is not necessarily chosen, this depends on the decoding algorithm. That aside, 'the next token will be generated to match that bad choice' makes it sound like once you generate one 'wrong' token the rest of the output is also wrong. A token is a few characters, and need not 'poison' the rest of the output.

That aside, there are plenty of ways to 'recover' from starting to go down the wrong route. A key aspect of why reasoning in LLMs works well is that it typically incorporates backtracking - going earlier in the reasoning to verify details or whatnot. You can do uncertainty estimation in the decoding algorithm, use a secondary model, plenty of things (here is a detailed survey https://arxiv.org/pdf/2311.05232 , one of several that is easy to find).

"The technology won't disappear – existing models, particularly in the open source domain, will still be available, and will still be used, but expect a few 'killer app' use cases to remain, with the rest falling away."

A quick google search shows ChatGPT currently has 800 million weekly active users who are using it for all sorts of things. AI-assisted programming is certainly here to stay, and there are plenty of other industries in which AI will be part of the workflow (helping do research, take notes, summarize, build presentations, etc.)

I think discussion is good, but it's disappointing to see stuff with this level of accuracy being on front page of HN.

andreyk commented on Poker Tournament for LLMs   pokerbattle.ai/event... · Posted by u/SweetSoftPillow
andreyk · 2 months ago
For reference, the details about how the LLMs are queried:

"How the players work

    All players use the same system prompt
    Each time it's their turn, or after a hand ends (to write a note), we query the LLM
    At each decision point, the LLM sees:
        General hand info — player positions, stacks, hero's cards
        Player stats across the tournament (VPIP, PFR, 3bet, etc.)
        Notes hero has written about other players in past hands
    From the LLM, we expect:
        Reasoning about the decision
        The action to take (executed in the poker engine)
        A reasoning summary for the live viewer interface
    Models have a maximum token limit for reasoning
    If there's a problem with the response (timeout, invalid output), the fallback action is fold"
The fact the models are given stats about the other models is rather disappointing to me, makes it less interesting. Would be curious how this would go if the models had to only use notes/context would be more interesting. Maybe it's a way to save on costs, this could get expensive...

andreyk commented on Poker Tournament for LLMs   pokerbattle.ai/event... · Posted by u/SweetSoftPillow
michalsustr · 2 months ago
I have PhD in algorithmic game theory and worked on poker.

1) There are currently no algorithms that can compute deterministic equilibrium strategies [0]. Therefore, mixed (randomized) strategies must be used for professional-level play or stronger.

2) In practice, strong play has been achieved with: i) online search and ii) a mechanism to ensure strategy consistency. Without ii) an adaptive opponent can learn to exploit inconsistency weaknesses in a repeated play.

3) LLMs do not have a mechanism for sampling from given probability distributions. E.g. if you ask LLM to sample a random number from 1 to 10, it will likely give you 3 or 7, as those are overrepresented in the training data.

Based on these points, it’s not technically feasible for current LLMs to play poker strongly. This is in contrast with Chess, where there is lots more of training data, there exists a deterministic optimal strategy and you do not need to ensure strategy consistency.

[0] There are deterministic approximations for subgames based on linear programming, but require to be fully loaded in memory, which is infeasible for the whole game.

andreyk · 2 months ago
But LLMs would presumably also condition on past observations of opponents - i.e. LLMs can conversely adapt their strategy during repeated play (especially if given a budget for reasoning as opposed to direct sampling from their output distributions).

The rules state the LLMs do get "Notes hero has written about other players in past hands" and "Models have a maximum token limit for reasoning" , so the outcome might be at least more interesting as a result.

The top models on the leaderboard are notably also the ones strongest in reasoning. They even show the models' notes, e.g. Grok on Claude: "About: claude Called preflop open and flop bet in multiway pot but folded to turn donk bet after checking, suggesting a passive postflop style that folds to aggression on later streets."

PS The sampling params also matter a lot (with temperature 0 the LLMs are going to be very consistent, going higher they could get more 'creative').

PPS the models getting statistics about other models' behavior seems kind of like cheating, they rely on it heavily, e.g. 'I flopped middle pair (tens) on a paired board (9s-Th-9d) against LLAMA, a loose passive player (64.5% VPIP, only 29.5% PFR)'

andreyk commented on Show HN: When is the next Caltrain? (minimal webapp)   erikschluntz.com/caltrain... · Posted by u/eschluntz
andreyk · 4 months ago
haha nice, the official caltrain schedule is a bit of a hassle to parse...
andreyk commented on How America's universities became debt factories   anandsanwal.me/college-st... · Posted by u/car
andreyk · a year ago
Seems like a good overview, but I do find this bit unclear: "But why don’t market forces correct these issues?

The answer lies in the unique shield that non-dischargeable student loans provide to educational institutions and lenders.

In a normal market, if a product consistently fails to deliver value, consumers stop buying it. Producers either improve or go out of business. But in the world of higher education, this feedback loop is broken.

Colleges and universities, shielded by the guarantee of student loan money, have no real incentive to improve their product or direct students to majors that have an ability to pay back their loans.

They can raise tuition year after year, even as the value of their degrees stagnates or declines. "

Sure, colleges can charge a lot due to loans, but they are still competing with each other and differences in tuition could make a big difference. I went to Georgia Tech over other universities because it was in-state and Georgia has generous scholarships for students with good grades. So why does competition among schools not lower costs?

u/andreyk

KarmaCake day4995December 28, 2015
About
www.andreykurenkov.com
View Original