JPLeRouzic (u/JPLeRouzic)

JPLeRouzic commented on Ask HN: How are Markov chains so different from tiny LLMs? · Posted by u/JPLeRouzic

aespinoza · a month ago

Would you be willing to write an article comparing the results ? Or share the code you used to test? I am super interested in the results of this experiment.

JPLeRouzic · a month ago

Thanks for your kind words. My code is not really novel, but it is not like the simplistic Markov chain text generators that are found by the ton on the web.

I will further improve my code and publish it when I am satisfied on my Github account.

It started as a Simple Language Model [0] as they differ from ordinary Markov generators by incorporating a crude prompt mechanism and a kind of very basic attention mechanism named history. My SLM uses Partial Matching (PPM). The one in the link is character-based and is very simple, but mine uses tokens and is 1300 C lines long.

The tokenizer tracks the end of sentences and paragraphs.

I didn't use part-of-a-word algorithms as LLMs do, but it's trivial to incorporate. Tokens are represented by a number (again as in LLMs), not a character chain.

I use Hash Tables for the Model.

There are several mechanisms used for fallbacks when the next state function fails. One of them uses the prompt. It is not demonstrated here.

Several other implemented mechanisms are not demonstrated here, like model pruning, skip-grams. I am trying to improve this Markov text generator, and some tips in the comments will be of great help.

But my point is not to make an LLM, it's just that LLMs produce good results not because of their supposedly advanced algorithms, but because of two things:

- There is an enormous amount of engineering in LLMs, whereas usually there is nearly none in Markov text generators, so people get the impression that Markov text generators are toys.

- LLMs are possible because they use impressive hardware improvements over the last decades. My text generator only uses 5MB of RAM when running this example! But as commentators told, the size of the model explodes quickly, and this is a point I should improve in my code.

And indeed, LLMs, even small LLMs like NanoGPT are unable to produce results as good as my text generator with only 42KB of training text.

https://github.com/JPLeRouzic/Small-language-model-with-comp...

JPLeRouzic commented on Ask HN: How are Markov chains so different from tiny LLMs? · Posted by u/JPLeRouzic

dragonwriter · a month ago

I think, strictly speaking, autoregressive LLMs are markov chains of a very high order.

The trick (aside from the order) is the training process by which they are derived from their source data. Simply enumerating the states and transitions in the source data and the probability of each transition from each state in the source doesn’t get you an LLM.

JPLeRouzic · a month ago

Yes I agree, my code includes a good tokenizer, not a simple word splitter.

JPLeRouzic commented on Ask HN: How are Markov chains so different from tiny LLMs? · Posted by u/JPLeRouzic

Anon84 · a month ago

Not that different. In fact, you can use Markov Chain theory as an analytical tool to study LLMs: https://arxiv.org/abs/2410.02724

You could probably point your code to Google Books N-grams (https://storage.googleapis.com/books/ngrams/books/datasetsv3...) and get something that sounds (somewhat) reasonable.

JPLeRouzic · a month ago

Thank you, this link (Google Books N-grams) looks very interesting.

JPLeRouzic commented on Ask HN: How are Markov chains so different from tiny LLMs? · Posted by u/JPLeRouzic

dragonwriter · a month ago

> A Markov Chain trained by only a single article of text will very likely just regurgitate entire sentences straight from the source material.

Strictly speaking, this is true of one particular way (the most straightforward) to derive a Markov chain from a body of text; a Markov chain is just a probabilistic model of state transitions where the probability of each possible next state is dependent only on the current state. Having the states be word sequence of some number of words, overlapping by all but one word, and having the probabilities being simply the frequency with which the added word in the target state follows the sequence in the source state in the training corpus is one way you can derive a Markov chain from a body of text, but not the only one.

JPLeRouzic · a month ago

> A Markov Chain trained by only a single article of text will very likely just regurgitate entire sentences straight from the source material.

I see this as a strength; try training an LLM on a 42KB text to see if it can produce a coherent output.

JPLeRouzic commented on Oklo Is Hiring: Software Engineering in Nuclear oklo.com/careers/... · Posted by u/clydehuibregtse

JPLeRouzic · a month ago

I like that you chose this name for working in nuclear energy:

https://en.wikipedia.org/wiki/Natural_nuclear_fission_reacto...

JPLeRouzic commented on ChatGPT Is Down status.openai.com/history... · Posted by u/pbshgthm

JPLeRouzic · a month ago

I have this message in my browser:

"Please unblock challenges.cloudflare.com to proceed"

It looks like it's because a Cloudflare Global Network issue

https://www.cloudflarestatus.com