It's easy to set up, but be warned, it takes up a lot of disk space.
$ du -h ~/archive/webpages
1.1T /home/andrew/archive/webpages
https://github.com/gildas-lormeau/SingleFileBut DeepSeek clearly states in their terms of service that they can train on your API data or use it for other purposes. Which one might assume their government can access as well.
We need direct eval comparisons between o3-mini and DeepSeek.. Or, well they are numbers so we can look them up on leaderboards.
How far are we from running a GPT-3/GPT-4 level LLM on regular consumer hardware, like a MacBook Pro?
Phi-4 is yet another step towards a small, open, GPT-4 level model. I think we're getting quite close.
Check the benchmarks comparing to GPT-4o on the first page of their technical report if you haven't already https://arxiv.org/pdf/2412.08905
Also FYI, your mail server seems to be down.
Expectation: 80% left, 20% right
Model sampling probability: 99% left, 1% right
>>> 0.80 * math.log(0.99 / 0.80) + 0.20 * math.log(0.01 / 0.20)
-0.42867188234223175
Model sampling probability: 90% left, 10% right
>>> 0.80 * math.log(0.9 / 0.80) + 0.20 * math.log(0.1 / 0.20)
-0.04440300758688229
Of course, if you change the temperature this will break any probablistic expectations from training in this manner.
My guess is that good answers are more well reasoned than answers that are short and to the point, and this is picked up in training or fine-tuning or some other step.
And probably the optimal amount of thinking has something to do with the training set or the size of the network (wild guesses).
This explains why GPT4 cannot accurately perform large number multiplication and decimal exponentiation. [0]
This example can extend to general natural language generation. While some answers can be immediately retrieved or generated by a "cache" / algorithm which exists in latent space, some tokens have better quality when their latent-space algorithm is executed in multiple steps.
[0] https://www.semanticscholar.org/reader/817e52b815560f95171d8...
Could you please add links to the documentation to the readme where it states "It includes detailed documentation".
Also maybe DPO should use the DDPG acronym instead so your repos Deterministic Policy Optimization isn't confused for trl's Direct Preference Optimization.
I am so tired of this "NoBody kNows hoW LLMs WoRk". It fucking software. Sophisticated probability tables with self correction. Not magic. Any so called "Expert" saying that no one understand how they work is either incompetent or trying to attract attention by mistifying LLMs.
What's being said is that the result of training and the way in which information is processed in latent space is opaque.
There are strategies to dissect a models inner workings, but this is an active field of research and incomplete.