Readit News logoReadit News
chpatrick commented on Gemini 3 Flash: Frontier intelligence built for speed   blog.google/products/gemi... · Posted by u/meetpateltech
aleph_minus_one · 5 days ago
> Pretty much every person in the first (and second) world is using AI now

This sounds like you live in a huge echo chamber. :-(

chpatrick · 5 days ago
All of my non techy friends use it, it's the new search engine. I think at this point people refusing to use it are the echo chamber.
chpatrick commented on I fed 24 years of my blog posts to a Markov model   susam.net/fed-24-years-of... · Posted by u/zdw
famouswaffles · 8 days ago
'A Markov chain is a mathematical structure where the probabilities of going to the next state only depend on the current state and not the previous path taken.'

My point, which seems so hard to grasp for whatever reason is that In a Markov chain, state is a well defined thing. It's not a variable you can assign any property to.

LLMs do depend on the previous path taken. That's the entire reason they're so useful! And the only reason you say they don't is because you've redefined 'state' to include that previous path! It's nonsense. Can you not see the circular argument?

The state is required to be a fixed, well-defined element of a structured state space. Redefining the state as an arbitrarily large, continuously valued encoding of the entire history is a redefinition that trivializes the Markov property, which a Markov chain should satisfy. Under your definition, any sequential system can be called Markov, which means the term no longer distinguishes anything.

chpatrick · 7 days ago
They only have the previous path in as much as n-gram Markov text generators have the previous path.
chpatrick commented on I fed 24 years of my blog posts to a Markov model   susam.net/fed-24-years-of... · Posted by u/zdw
famouswaffles · 8 days ago
I did not even remember you and had to dig to find out what you were on about. Just a heads up, if you've had a previous argument and you want to bring that up later then just speak plainly. Why act like "somebody" is anyone but you?

My response to both of you is the same.

LLMs do depend on previous events, but you say they don't because you've redefined state to include previous events. It's a circular argument. In a Markov chain, state is well defined, not something you can insert any property you want to or redefine as you wish.

It's not my fault neither of you understand what the Markov property is.

chpatrick · 8 days ago
By that definition n-gram Markov chain text generators also include previous state because you always put the last n grams. :) It's exactly the same situation as LLMs, just with higher, but still fixed n.
chpatrick commented on I fed 24 years of my blog posts to a Markov model   susam.net/fed-24-years-of... · Posted by u/zdw
sigbottle · 9 days ago
Have you ever actually worked with a basic markov problem?

The markov property states that your state is a transition of probabilities entirely from the previous state.

These states, inhabit a state space. The way you encode "memory" if you need it, e.g. say you need to remember if it rained the last 3 days, is by expanding said state space. In that case, you'd go from 1 state to 3 states, 2^3 states if you needed the precise binary information for each day. Being "clever", maybe you assume only the # of days it rained, in the past 3 days mattered, you can get a 'linear' amount of memory.

Sure, a LLM is a "markov chain" of state space size (# tokens)^(context length), at minimum. That's not a helpful abstraction and defeats the original purpose of the markov observation. The entire point of the markov observation is that you can represent a seemingly huge predictive model with just a couple of variables in a discrete state space, and ideally you're the clever programmer/researcher and can significantly collapse said space by being, well, clever.

Are you deliberately missing the point or what?

chpatrick · 9 days ago
> Sure, a LLM is a "markov chain" of state space size (# tokens)^(context length), at minimum.

Okay, so we're agreed.

chpatrick commented on I fed 24 years of my blog posts to a Markov model   susam.net/fed-24-years-of... · Posted by u/zdw
famouswaffles · 9 days ago
1. A context limit is not a Markov order. An n-gram model’s defining constraint is: there exists a small constant k such that the next-token distribution depends only on the last k tokens, full stop. You can't use a k-trained markov model on anything but k tokens, and each token has the same relationship with each other regardless. An LLM’s defining behavior is the opposite: within its window it can condition on any earlier token, and which tokens matter can change drastically with the prompt (attention is content-dependent). “Window size = 8k/128k” is not “order k” in the Markov sense; it’s just a hard truncation boundary.

2. “Fixed-size block” is a padding detail, not a modeling assumption. Yes, implementations batch/pad to a maximum length. But the model is fundamentally conditioned on a variable-length prefix (up to the cap), and it treats position 37 differently from position 3,700 because the computation explicitly uses positional information. That means the conditional distribution is not a simple stationary “transition table” the way the n-gram picture suggests.

3. “Same as a lookup table” is exactly the part that breaks. A classic n-gram Markov model is literally a table (or smoothed table) from discrete contexts to next-token probabilities. A transformer is a learned function that computes a representation of the entire prefix and uses that to produce a distribution. Two contexts that were never seen verbatim in training can still yield sensible outputs because the model generalizes via shared parameters; that is categorically unlike n-gram lookup behavior.

I don't know how many times I have to spell this out for you. Calling LLMs markov chains is less than useless. They don't resemble them in any way unless you understand neither.

chpatrick · 9 days ago
I think you're confusing Markov chains and "Markov chain text generators". A Markov chain is a mathematical structure where the probabilities of going to the next state only depend on the current state and not the previous path taken. That's it. It doesn't say anything about whether the probabilities are computed by a transformer or stored in a lookup table, it just exists. How the probabilities are determined in a program doesn't matter mathematically.
chpatrick commented on I fed 24 years of my blog posts to a Markov model   susam.net/fed-24-years-of... · Posted by u/zdw
sigbottle · 9 days ago
The etymology of the "markov property" is that the current state does not depend on history.

And in classes, the very first trick you learn to skirt around history is to add Boolean variables to your "memory state". Your systems now model, "did it rain The previous N days?" The issue obviously being that this is exponential if you're not careful. Maybe you can get clever by just making your state a "sliding window history", then it's linear in the number of days you remember. Maybe mix the both. Maybe add even more information .Tradeoffs, tradeoffs.

I don't think LLMs embody the markov property at all, even if you can make everything eventually follow the markov property by just "considering every single possible state". Of which there are (size of token set)^(length) states at minimum because of the KV cache.

chpatrick · 9 days ago
The KV cache doesn't affect it because it's just an optimization. LLMs are stateless and don't take any other input than a fixed block of text. They don't have memory, which is the requirement for a Markov chain.
chpatrick commented on I fed 24 years of my blog posts to a Markov model   susam.net/fed-24-years-of... · Posted by u/zdw
famouswaffles · 9 days ago
>Sure many things can be modelled as Markov chains

Again, no they can't, unless you break the definition. K is not a variable. It's as simple as that. The state cannot be flexible.

1. The markov text model uses k tokens, not k tokens sometimes, n tokens other times and whatever you want it to be the rest of the time.

2. A markov model is explcitly described as 'assuming that future states depend only on the current state, not on the events that occurred before it'. Defining your 'state' such that every event imaginable can be captured inside it is a 'clever' workaround, but is ultimately describing something that is decidedly not a markov model.

chpatrick · 9 days ago
It's not n sometimes, k tokens some other times. LLMs have fixed context windows, you just sometimes have less text so it's not full. They're pure functions from a fixed size block of text to a probability distribution of the next character, same as the classic lookup table n gram Markov chain model.
chpatrick commented on I fed 24 years of my blog posts to a Markov model   susam.net/fed-24-years-of... · Posted by u/zdw
wizzwizz4 · 9 days ago
A GPT model would be modelled as an n-gram Markov model where n is the size of the context window. This is slightly useful for getting some crude bounds on the behaviour of GPT models in general, but is not a very efficient way to store a GPT model.
chpatrick · 9 days ago
I'm not saying it's an n-gram Markov model or that you should store them as a lookup table. Markov models are just a mathematical concept that don't say anything about storage, just that the state change probabilities are a pure function of the current state.
chpatrick commented on I fed 24 years of my blog posts to a Markov model   susam.net/fed-24-years-of... · Posted by u/zdw
famouswaffles · 9 days ago
Yes, technically you can frame an LLM as a Markov chain by defining the "state" as the entire sequence of previous tokens. But this is a vacuous observation under that definition, literally any deterministic or stochastic process becomes a Markov chain if you make the state space flexible enough. A chess game is a "Markov chain" if the state includes the full board position and move history. The weather is a "Markov chain" if the state includes all relevant atmospheric variables.

The problem is that this definition strips away what makes Markov models useful and interesting as a modeling framework. A “Markov text model” is a low-order Markov model (e.g., n-grams) with a fixed, tractable state and transitions based only on the last k tokens. LLMs aren’t that: they model using un-fixed long-range context (up to the window). For Markov chains, k is non-negotiable. It's a constant, not a variable. Once you make it a variable, near any process can be described as markovian, and the word is useless.

chpatrick · 9 days ago
Sure many things can be modelled as Markov chains, which is why they're useful. But it's a mathematical model so there's no bound on how big the state is allowed to be. The only requirement is that all you need is the current state to determine the probabilities of the next state, which is exactly how LLMs work. They don't remember anything beyond the last thing they generated. They just have big context windows.
chpatrick commented on I fed 24 years of my blog posts to a Markov model   susam.net/fed-24-years-of... · Posted by u/zdw
famouswaffles · 9 days ago
LLMs are not Markov Chains unless you contort the meaning of a Markov Model State so much you could even include the human brain.
chpatrick · 9 days ago
Not sure why that's contorting, a markov model is anything where you know the probability of going from state A to state B. The state can be anything. When it's text generation the state is previous text to text with an extra character, which is true for both LLMs and oldschool n-gram markov models.

u/chpatrick

KarmaCake day2234May 5, 2014View Original