trq_ (u/trq_) - Readit News

trq_ commented on Show HN: Llama 3.3 70B Sparse Autoencoders with API access goodfire.ai/papers/mappin... · Posted by u/trq_

wg0 · 9 months ago

Noob question - how do we know that these autoencoders aren't hallucinating and really are mapping/clustering what they should be?

trq_ · 9 months ago

Hmm the hallucination would happen in the auto labelling, but we review and test our labels and they seem correct!

trq_ commented on Show HN: Llama 3.3 70B Sparse Autoencoders with API access goodfire.ai/papers/mappin... · Posted by u/trq_

trq_ · 9 months ago

If you're hacking on this and have questions, please join us on Discord: https://discord.gg/vhT9Chrt

trq_ commented on Show HN: Llama 3.3 70B Sparse Autoencoders with API access goodfire.ai/papers/mappin... · Posted by u/trq_

swyx · 9 months ago

nice work. enjoyed the zoomable UMAP. i wonder if there are hparams to recluster the UMAP in interesting ways.

after the idea that Claude 3.5 Sonnet used SAEs to improve its coding ability i'm not sure if i'm aware of any actual practical use of them yet beyond Golden Gate Claude (and Golden Gate Gemma (https://x.com/swyx/status/1818711762558198130)

has anyone tried out Anthropic's matching SAE API yet? wondering how it compares with Goodfire's and if there's any known practical use.

trq_ · 9 months ago

We haven't yet found generalizable "make this model smarter" features, but there is a tradeoff of putting instructions in system prompts, e.g. if you have a chatbot that sometimes generates code, you can give it very specific instructions when it's coding and leave those out of the system prompt otherwise.

We have a notebook about that here: https://docs.goodfire.ai/notebooks/dynamicprompts

trq_ commented on Detecting when LLMs are uncertain thariq.io/blog/entropix/... · Posted by u/trq_

zby · 10 months ago

These sampling based techniques is a rare occasion where experimenting with consumer hardware can let you improve on SOTA models. I don't think it will last - the end game surely will be a trainable sampler. But for now - enjoy tinkering: https://github.com/codelion/optillm implements a few of these techniques

optillm authors suggest that the additional computations in Entropics don’t bring any better results in comparison with the simple CoT decoding (but I am not sure if they also check efficiency):https://x.com/asankhaya/status/1846736390152949966

It looks to me that many problems with LLMs come from something like semantic leaking, or distraction by irrelevant information (like in the GSM Symbolic paper) - maybe there is some space for improving attention too.

I wrote a couple of blog posts on these subjects: https://zzbbyy.substack.com/p/semantic-leakage-quick-notes, https://zzbbyy.substack.com/p/llms-and-reasoning, https://zzbbyy.substack.com/p/o1-inference-time-turing-machi...

trq_ · 10 months ago

This is incredible! I haven't seen that repo yet, thank you for pointing it out, and the writing

trq_ commented on Detecting when LLMs are uncertain thariq.io/blog/entropix/... · Posted by u/trq_

CooCooCaCha · a year ago

Aren’t those different flavors of uncertainty?

trq_ · a year ago

Yeah, I think the idea of finding out what flavor of uncertainty you have is very interesting.