alexhutcheson (u/alexhutcheson)

alexhutcheson commented on RLHF Book rlhfbook.com/... · Posted by u/jxmorris12

kadushka · a year ago

Has r1 made RLHF obsolete?

DeepSeek-R1 had an RLHF step in their post-training pipeline (section 2.3.4 of their technical report[1]).

In addition, the "reasoning-oriented reinforcement learning" step (section 2.3.2) used an approach that is almost identical to RLHF in theory and implementation. The main difference is that they used a rule-based reward system, rather than a model trained on human preference data.

If you want to train a model like DeepSeek-R1, you'll need to know the fundamentals of reinforcement learning on language models, including RLHF.

[1] https://arxiv.org/pdf/2501.12948

alexhutcheson commented on RLHF Book rlhfbook.com/... · Posted by u/jxmorris12

alexhutcheson · a year ago

Glad to see the author making a serious effort to fill the gap in public documentation of RLHF theory and practice. The current state of the art seems to be primarily documented in arXiv papers, but each paper is more like a "diff" than a "snapshot" - you need to patch together the knowledge from many previous papers to understand the current state. It's extremely valuable to "snapshot" the current state of the art in a way that is easy to reference.

My friendly feedback on this work-in-progress: I believe it could benefit from more introductory material to establish motivations and set expectations for what is achievable with RLHF. In particular, I think it would be useful to situate RLHF in comparison with supervised fine-tuning (SFT), which readers are likely familiar with.

Stuff I'd cover (from the background of an RLHF user but non-specialist):

Advantages of RLHF over SFT:

- Tunes on the full generation (which is what you ultimately care about), not just token-by-token.

- Can tune on problems where there are many acceptable answers (or ways to word the answer), and you don't want to push the model into one specific series of tokens.

- Can incorporate negative feedback (e.g. don't generate this).

Disadvantages of RLHF over SFT:

- Regularization (KL or otherwise) puts an upper bound on how much impact RLHF can have on the model. Because of this, RLHF is almost never enough to get you "all the way there" by itself.

- Very sensitive to reward model quality, which can be hard to evaluate.

- Much more resource and time intensive.

Non-obvious practical considerations:

- How to evaluate quality? If you have a good measurement of quality, it's tempting to just incorporate it in your reward model. But you want to make sure you're able to measure "is this actually good for my final use-case", not just "does this score well on my reward model?".

- How prompt engineering interacts with fine-tuning (both SFT and RLHF). Often some iteration on the system prompt will make fine-tuning converge faster, and with higher quality. Conversely, attempting to tune on examples that don't include a task-specific prompt (surprisingly common) will often yield subpar results. This is a "boring" implementation detail that I don't normally see included in papers.

Excited to see where this goes, and thanks to the author for willingness to share a work in progress!

alexhutcheson commented on Seer: A GUI front end to GDB for Linux github.com/epasveer/seer... · Posted by u/turrini

VyseofArcadia · a year ago

For the Emacs users in the crowd, GUD is a pretty great GDB integration.

alexhutcheson · a year ago

I prefer the GDB Graphical Interface in Emacs[1] (M-x gdb), rather than the more basic integration via GUD[2] (M-x gud-gdb). I’ve had to switch to GUD to run lldb recently, and I miss having dedicated windows that show breakpoints, threads, the current stack, etc.

The one nice thing about GUD is that the interface is consistent across debuggers, so I don’t need to refresh myself on the keyboard shortcuts when switching between debugging Python with pdb and C++ with lldb.

[1] https://www.gnu.org/software/emacs/manual/html_node/emacs/GD...

[2] https://www.gnu.org/software/emacs/manual/html_node/emacs/St...

alexhutcheson commented on Seer: A GUI front end to GDB for Linux github.com/epasveer/seer... · Posted by u/turrini

alexhutcheson · a year ago

GDB also has a built-in text user interface (TUI) that is surprisingly easy to use[1]. It even supports mouse interaction.

[1] https://sourceware.org/gdb/current/onlinedocs/gdb.html/TUI.h...

alexhutcheson commented on Ask HN: Alternative to Emacs with undo-tree functionality? · Posted by u/HexDecOctBin

tetris11 · a year ago

TRAMP is killing me. I have my Emacs setup the exact way I want it, but trying to work on a production server through Emacs is torture.

Yes I'm multiplexing my SSH connections through a master control, yes I've tried other SSH modes, yes I'm using a dumb terminal on the other side... it still communicates at the speed of suffering.

alexhutcheson · a year ago

Try disabling VC over Tramp connections[1]:

  (setq vc-ignore-dir-regexp
        (format "\\(%s\\)\\|\\(%s\\)"
                vc-ignore-dir-regexp
                tramp-file-name-regexp))

VC is quite chatty and assumes that filesystem operations have a negligible cost. Before I disabled it, VC was adding >1 second to every find-file operation over Tramp.

I also recommend using the direct-async-process connection property[2], which significantly decreases the latency of async process creation.

[1] https://www.gnu.org/software/emacs/manual/html_node/tramp/Fr...

[2] https://www.gnu.org/software/emacs/manual/html_node/tramp/Re...

alexhutcheson commented on Probably pay attention to tokenizers cybernetist.com/2024/10/2... · Posted by u/ingve

authorfly · a year ago

Check out the training data. Sentence transformer models training data includes lots of typos and this is desirable. There was debate around training/inferencing with stemmed/postprocessing words for a long time.

Typos should minimally impact your RAG.

alexhutcheson · a year ago

It depends if they are using a “vanilla” instruction-tuned model or are applying additional task-specific fine-tuning. Fine-tuning with data that doesn’t have misspellings can make the model “forget” how to handle them.

In general, fine-tuned models often fail to generalize well on inputs that aren’t very close to examples in the fine-tuning data set.

alexhutcheson commented on Visual Studio Code is designed to fracture (2022) ghuntley.com/fracture/... · Posted by u/ghuntley

monsieurbanana · a year ago

> The client, accessing remote repos, is wildly insecure, by design

Who's the best kid in the block regarding third-party extensions security?

There's really not much standing in front of a supply-chain attack for my editor of choice, emacs. Most people use a community extensions aggregator that also directly fetches from git repositories. The only slim advantage we have is that I'm sure a much higher % of emacs users would actually look into the source code of the extensions they pull.

alexhutcheson · a year ago

If you want to be cautious, I have somewhat higher confidence in the versions of Emacs packages published on the Debian repositories[1] than the ones on ELPA/MELPA.

The downside is that not every package is packaged for Debian, and the versions are a bit stale.

https://packages.debian.org/search?keywords=ELPA+&searchon=n...

alexhutcheson commented on Pivotal Tracker will shut down pivotaltracker.com/blog/2... · Posted by u/sandinmyjoints

alexhutcheson · a year ago

Are there any open source self-hostable tracking/project management tools that still have a committed team and forward momentum?

I used to self-host a Phabricator instance, which I liked a lot, but the upstream maintainer made the reasonable decision to step away.

My guess is there is not much of a niche for self-hosted solutions anymore. The GitHub Issues free tier covers most of the low-complexity use-cases, while higher-complexity use-cases are addressed by enterprise SaaS.

alexhutcheson commented on Calendar Queues: A Fast O(1) Priority Queue Implementation (1988) dl.acm.org/doi/pdf/10.114... · Posted by u/tithe

Sesse__ · 2 years ago

std::priority_queue is sorely missing the operation “change the priority of this element” (you need to do it using a delete and then a new insert, which is rather slow), which comes up all the time in e.g. Dijkstra's algorithm.

alexhutcheson · 2 years ago

Boost.Heap has this functionality. Or if you want to stick with the standard library it’s fairly easy to use the *_heap functions from <algorithm> and just hand-code your own fix_heap(first, last, changed) function. Agree it would be more convenient to have it built-in, though.