Readit News logoReadit News
remoquete commented on Can LLMs do randomness?   rnikhil.com/2025/04/26/ll... · Posted by u/whoami_nr
DimitriBouriez · 9 months ago
One thing to consider: we don’t know if these LLMs are wrapped with server-side logic that injects randomness (e.g. using actual code or external RNG). The outputs might not come purely from the model's token probabilities, but from some opaque post-processing layer. That’s a major blind spot in this kind of testing.
remoquete · 9 months ago
Agreed. These tests should be performed on local models.
remoquete commented on Sycophancy in GPT-4o   openai.com/index/sycophan... · Posted by u/dsr12
thethethethe · 9 months ago
I'm not sure how this problem can be solved. How do you test a system with emergent properties of this degree that whose behavior is dependent on existing memory of customer chats in production?
remoquete · 9 months ago
Using prompts know to be problematic? Some sort of... Voight-Kampff test for LLMs?
theletterf commented on Sycophancy in GPT-4o   openai.com/index/sycophan... · Posted by u/dsr12
theletterf · 9 months ago
Don't they test the models before rolling out changes like this? All it takes is a team of interaction designers and writers. Google has one.
remoquete commented on Show HN: I built an AI that turns GitHub codebases into easy tutorials   github.com/The-Pocket/Tut... · Posted by u/zh2408
amelius · 10 months ago
I've said this a few times on HN: why don't we use LLMs to generate documentation? But then came the naysayers ...
remoquete · 10 months ago
Because you can't. See my previous comment. https://news.ycombinator.com/item?id=43748908
remoquete commented on Show HN: I built an AI that turns GitHub codebases into easy tutorials   github.com/The-Pocket/Tut... · Posted by u/zh2408
kaycebasques · 10 months ago
My bet is that the combination of humans and language models is stronger than humans alone or models alone. In other words there's a virtuous cycle developing where the codebases that embrace machine documentation tools end up getting higher quality docs in the long run. For example, last week I tried out a codebase summary tool. It had some inaccuracies and I knew exactly where it was pulling the incorrect data from. I fixed that data, re-ran the summarization tool, and was satisfied to see a more accurate summary. But yes, it's probably key to keep human technical writers (like myself!) in the loop.
remoquete · 10 months ago
Indeed. Augmentation is the way forward.
remoquete commented on Show HN: I built an AI that turns GitHub codebases into easy tutorials   github.com/The-Pocket/Tut... · Posted by u/zh2408
remoquete · 10 months ago
This is nice and fun for getting some fast indications on an unknown codebase, but, as others said here and elsewhere, it doesn't replace human-made documentation.

https://passo.uno/whats-wrong-ai-generated-docs/

remoquete commented on Claude Code: Best practices for agentic coding   anthropic.com/engineering... · Posted by u/sqs
remoquete · 10 months ago
What's the Gemini equivalent of Claude Code and OpenAI's Codex? I've found projects like reugn/gemini-cli, but Gemini Code Assist seems limited to VS Code?

u/remoquete

KarmaCake day1189October 2, 2019
About
I'm a technical writer and aspiring Rust developer based in Barcelona, Spain

My site: https://passo.uno

View Original