Readit News logoReadit News
ziaowang commented on People are just as bad as my LLMs   wilsoniumite.com/2025/03/... · Posted by u/Wilsoniumite
Xelynega · a year ago
I don't understand what you are saying.

How can the RLHF phase eliminate bias if it uses a process(human input) that has the same biases as the pre-training(human input)?

ziaowang · a year ago
Texts in the wild used during pre-training contain lots of biases, such as racial and sexual biases, which are picked-up by the model.

During RLHF, the human evaluators are aware of such biases and are instructed to down-vote the model responses that incorporate such biases.

ziaowang commented on People are just as bad as my LLMs   wilsoniumite.com/2025/03/... · Posted by u/Wilsoniumite
smallnix · a year ago
Is my understanding wrong that LLMs are trained to emulate observed human behavior in their training data?

From that follows that LLMs fit to produce all kinds of human biases. Like preferring the first choice out of many, and the last our of many (primacy biases). Funnily the LLM might replicate the biases slightly wrong and by doing so produce new derived biases.

ziaowang · a year ago
This understanding is incomplete in my opinion. LLMs are more than emulating observed behavior. In the pre-training phase tasks like masked language model indeed train the model to mimic what they read (which of course contains lots of bias); but in the RLHF phase, the model tries to generate the best response judged by human evaluations (who tries to eliminate as much bias as possible in the process). In other words, they are trained to meet human expectations in this later phase.

But human expectations are also not bias-free (e.g. from the preferring-the-first-choice phenomenon)

ziaowang commented on An analysis of DeepSeek's R1-Zero and R1   arcprize.org/blog/r1-zero... · Posted by u/meetpateltech
nogridbag · a year ago
From what I read elsewhere (random reddit comment), the visible reasoning is just "for show" and isn't the process deepseek used to arrive at the result. But if the reasoning has value, I guess it doesn't matter even if it's fake.
ziaowang · a year ago
Can you provide a link to the comment?

R1's technical report (https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSee...) says the prompt used for training is "<think> reasoning process here </think> <answer> answer here </answer>. User: prompt. Assistant:" This prompt format strongly suggests that the text between <think> is made the "reasoning" and the text between <answer> is made the "answer" in the web app and API (https://api-docs.deepseek.com/guides/reasoning_model). I see no reason why deepseek should not do it this way, if not considering post-generation filtering.

Plus, if you read table 3 of the R1 technical report, which contains an example of R1's chain of thought, its style (going back to re-evaluating the problem) resembles what I actually got in the COT in the web app.

ziaowang commented on 'Thirsty' ChatGPT uses four times more water than previously thought   thetimes.com/uk/technolog... · Posted by u/todsacerdoti
comboy · a year ago
Not sure if that' more than a person sitting in the office writing this when you take into account all electricity used by the building, maintenance, other people (catering cleaning etc.)

And I think this energy-shaming whether that's AI crypto or something else is just a wrong approach. If somebody buys energy he can use it for mindless entertainment if he so pleases, why focus on datacenters and not Disneyland?

ziaowang · a year ago
Agreed. If it's useful, why not scale up the electricity and water supply, and make the latter sustainable.
ziaowang commented on Looking for a Job Is Tough   blog.kaplich.me/looking-f... · Posted by u/skaplich
thw09j9m · a year ago
This is the toughest market I've ever seen. I easily made it to on-sites at FAANG a few years ago and now I'm getting resume rejected by no-name startups (and FAANG).

The bar has also been raised significantly. I had an interview recently where I solved the algorithm question very quickly, but didn't refactor/clean up my code perfectly and was rejected.

ziaowang · a year ago
Though FAANG offers are usually more attractive than startups (considering pay level and stability), some startups could be more selective since they couldn't afford to hire the wrong candidate.

u/ziaowang

KarmaCake day7September 22, 2024
About
Curious about artificial intelligence. Experience in self-hosting. Newbie user of Linux and other software.
View Original