Readit News logoReadit News
trq_ commented on Show HN: Llama 3.3 70B Sparse Autoencoders with API access   goodfire.ai/papers/mappin... · Posted by u/trq_
wg0 · 9 months ago
Noob question - how do we know that these autoencoders aren't hallucinating and really are mapping/clustering what they should be?
trq_ · 9 months ago
Hmm the hallucination would happen in the auto labelling, but we review and test our labels and they seem correct!
trq_ commented on Show HN: Llama 3.3 70B Sparse Autoencoders with API access   goodfire.ai/papers/mappin... · Posted by u/trq_
trq_ · 9 months ago
If you're hacking on this and have questions, please join us on Discord: https://discord.gg/vhT9Chrt
trq_ commented on Show HN: Llama 3.3 70B Sparse Autoencoders with API access   goodfire.ai/papers/mappin... · Posted by u/trq_
swyx · 9 months ago
nice work. enjoyed the zoomable UMAP. i wonder if there are hparams to recluster the UMAP in interesting ways.

after the idea that Claude 3.5 Sonnet used SAEs to improve its coding ability i'm not sure if i'm aware of any actual practical use of them yet beyond Golden Gate Claude (and Golden Gate Gemma (https://x.com/swyx/status/1818711762558198130)

has anyone tried out Anthropic's matching SAE API yet? wondering how it compares with Goodfire's and if there's any known practical use.

trq_ · 9 months ago
We haven't yet found generalizable "make this model smarter" features, but there is a tradeoff of putting instructions in system prompts, e.g. if you have a chatbot that sometimes generates code, you can give it very specific instructions when it's coding and leave those out of the system prompt otherwise.

We have a notebook about that here: https://docs.goodfire.ai/notebooks/dynamicprompts

trq_ commented on Detecting when LLMs are uncertain   thariq.io/blog/entropix/... · Posted by u/trq_
zby · 10 months ago
These sampling based techniques is a rare occasion where experimenting with consumer hardware can let you improve on SOTA models. I don't think it will last - the end game surely will be a trainable sampler. But for now - enjoy tinkering: https://github.com/codelion/optillm implements a few of these techniques

optillm authors suggest that the additional computations in Entropics don’t bring any better results in comparison with the simple CoT decoding (but I am not sure if they also check efficiency):https://x.com/asankhaya/status/1846736390152949966

It looks to me that many problems with LLMs come from something like semantic leaking, or distraction by irrelevant information (like in the GSM Symbolic paper) - maybe there is some space for improving attention too.

I wrote a couple of blog posts on these subjects: https://zzbbyy.substack.com/p/semantic-leakage-quick-notes, https://zzbbyy.substack.com/p/llms-and-reasoning, https://zzbbyy.substack.com/p/o1-inference-time-turing-machi...

trq_ · 10 months ago
This is incredible! I haven't seen that repo yet, thank you for pointing it out, and the writing
trq_ commented on Detecting when LLMs are uncertain   thariq.io/blog/entropix/... · Posted by u/trq_
CooCooCaCha · a year ago
Aren’t those different flavors of uncertainty?
trq_ · a year ago
Yeah, I think the idea of finding out what flavor of uncertainty you have is very interesting.

u/trq_

KarmaCake day468October 21, 2012
About
Writing about AI, games and more at: https://thariq.io
View Original