lappa (u/lappa) - Readit News

lappa commented on The Monster Inside ChatGPT wsj.com/opinion/the-monst... · Posted by u/petethomas

Azkron · 2 months ago

| "Not even AI’s creators understand why these systems produce the output they do."

I am so tired of this "NoBody kNows hoW LLMs WoRk". It fucking software. Sophisticated probability tables with self correction. Not magic. Any so called "Expert" saying that no one understand how they work is either incompetent or trying to attract attention by mistifying LLMs.

lappa · 2 months ago

This isn't suggesting no one understands how these models are architected, nor is anyone saying that SDPA / matrix multiplication isn't understood by those who create these systems.

What's being said is that the result of training and the way in which information is processed in latent space is opaque.

There are strategies to dissect a models inner workings, but this is an active field of research and incomplete.

lappa commented on Compiler Explorer and the promise of URLs that last forever xania.org/202505/compiler... · Posted by u/anarazel

kccqzy · 3 months ago

Before 2010 I had this unquestioned assumption that links are supposed to last forever. I used the bookmark feature of my browser extensively. Some time afterwards, I discovered that a large fraction of my bookmarks were essentially unusable due to linkrot. My modus operandi after that was to print the webpage as a PDF. A bit afterwards when reader views became popular reliable, I just copy-pasted the content from the reader view into an RTF file.

lappa · 3 months ago

I use the SingleFile extension to archive every page I visit.

It's easy to set up, but be warned, it takes up a lot of disk space.

    $ du -h ~/archive/webpages
    1.1T /home/andrew/archive/webpages

https://github.com/gildas-lormeau/SingleFile

lappa commented on OpenAI O3-Mini openai.com/index/openai-o... · Posted by u/johnneville

ilaksh · 7 months ago

I don't think OpenAI is training on your data. At least they say they don't, and I believe that. I wouldn't be surprised if the NSA or something has access to data if they request it or something though.

But DeepSeek clearly states in their terms of service that they can train on your API data or use it for other purposes. Which one might assume their government can access as well.

We need direct eval comparisons between o3-mini and DeepSeek.. Or, well they are numbers so we can look them up on leaderboards.

lappa · 7 months ago

OpenAI clearly states that they train on your data https://help.openai.com/en/articles/5722486-how-your-data-is...

lappa commented on Phi-4: Microsoft's Newest Small Language Model Specializing in Complex Reasoning techcommunity.microsoft.c... · Posted by u/lappa

xeckr · 9 months ago

Looks like it punches way above its weight(s).

How far are we from running a GPT-3/GPT-4 level LLM on regular consumer hardware, like a MacBook Pro?

lappa · 9 months ago

It's easy to argue that Llama-3.3 8B performs better than GPT-3.5. Compare their benchmarks, and try the two side-by-side.

Phi-4 is yet another step towards a small, open, GPT-4 level model. I think we're getting quite close.

Check the benchmarks comparing to GPT-4o on the first page of their technical report if you haven't already https://arxiv.org/pdf/2412.08905

lappa commented on Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers · Posted by u/areddyyt

lappa · a year ago

Great project, looking forward to seeing more as this develops.

Also FYI, your mail server seems to be down.

lappa commented on LLMs can't do probability brainsteam.co.uk/2024/05/... · Posted by u/DrRavenstein

kelseyfrog · a year ago

At some point the logits at a branching point in the response need to correspond to the respective probabilities of the requested output classes so that they can be appropriately sampled and strongly condition the remainder of the response. My instinct says this cannot be accomplished irrespective of temperature, but I could be persuaded. with math.

lappa · a year ago

Provided a constant temperature of 1.0, you can train the model on prompts with probablistic requests, with loss determined by KL divergence.

Expectation: 80% left, 20% right

Model sampling probability: 99% left, 1% right

>>> 0.80 * math.log(0.99 / 0.80) + 0.20 * math.log(0.01 / 0.20)

-0.42867188234223175

Model sampling probability: 90% left, 10% right

>>> 0.80 * math.log(0.9 / 0.80) + 0.20 * math.log(0.1 / 0.20)

-0.04440300758688229

Of course, if you change the temperature this will break any probablistic expectations from training in this manner.

lappa commented on How Chain-of-Thought Reasoning Helps Neural Networks Compute quantamagazine.org/how-ch... · Posted by u/amichail

nextaccountic · a year ago

This begs the question: why is it that giving them more time to "think" yields better answers, and is there any limit to that? If I make them write hundreds of pages of explanation, there must be a diminishing returns of some kind. What influences the optimal amount of thinking?

My guess is that good answers are more well reasoned than answers that are short and to the point, and this is picked up in training or fine-tuning or some other step.

And probably the optimal amount of thinking has something to do with the training set or the size of the network (wild guesses).

lappa · a year ago

Look at it from an algorithmic perspective. In computer science many algorithms take a non-constant number of steps to execute. However, in transformers models, there are a limited number of decoder blocks, and a limited number of FFN layers in each block. This presents a theoretical upper bound on the complexity of the algorithms a decoder network can solve in a single token generation pass.

This explains why GPT4 cannot accurately perform large number multiplication and decimal exponentiation. [0]

This example can extend to general natural language generation. While some answers can be immediately retrieved or generated by a "cache" / algorithm which exists in latent space, some tokens have better quality when their latent-space algorithm is executed in multiple steps.

[0] https://www.semanticscholar.org/reader/817e52b815560f95171d8...

lappa commented on Show HN: Next-Gen AI Training: LLM-RLHF-Tuning with PPO and DPO github.com/raghavc/LLM-RL... · Posted by u/rags1

lappa · a year ago

Very interested in the expansion of RL for transformers, but I can't quite tell what this project is.

Could you please add links to the documentation to the readme where it states "It includes detailed documentation".

Also maybe DPO should use the DDPG acronym instead so your repos Deterministic Policy Optimization isn't confused for trl's Direct Preference Optimization.