Readit News logoReadit News
Zerot commented on Some thoughts on LLMs and software development   martinfowler.com/articles... · Posted by u/floverfelt
gck1 · 6 days ago
LLMs (Sonnet, Gemini from what I tested) tend to “fix” failing tests by either removing them outright or tweaking the assertions just enough to make them pass. The opposite happens too - sometimes they change the actual logic when what really needs updating is the test.

In short, LLMs often get confused about where the problem lies: the code under test or the test itself. And no amount of context engineering seems to solve that.

Zerot · 6 days ago
I think in part the issue is that the LLM does not have enough context. The difference between a bug in the test or a bug in the implementation is purely based on the requirements which are often not in the source code and stored somewhere else(ticket system, documentation platform).

Without providing the actual feature requirements to the LLM(or the developer) it is impossible to determine which is wrong.

Which is why I think it is also sort of stupid by having the LLM generate tests by just giving it access to the implementation. That is at best testing the implementation as it is, but tests should be based on the requirements.

Zerot commented on AccountingBench: Evaluating LLMs on real long-horizon business tasks   accounting.penrose.com/... · Posted by u/rickcarlino
umanwizard · a month ago
I have never seen someone write $9.09 as $9.9. What country is this common in?
Zerot · a month ago
None. That is the point. 9.9 can be both bigger or smaller than 9.11 depending on context
Zerot commented on AccountingBench: Evaluating LLMs on real long-horizon business tasks   accounting.penrose.com/... · Posted by u/rickcarlino
multjoy · a month ago
How can "which is the larger number" be an ambiguous question?
Zerot · a month ago
Which is the bigger version number? Version 9.9 or version 9.11? Which is the bigger dollar amount? $9.9 or $9.11?

Periods are not always used for the decimal separator but also as a separator for multiple sets of semi-independent numbers.

Zerot commented on Why is F# code robust and reliable?   devblogs.microsoft.com/do... · Posted by u/b-man
pjtr · a year ago
Is "let-over-lambda" (variable capture / closure) not possible in F#? Is that not a form of implicit dependency injection?
Zerot · a year ago
Closures are possible, yeah. But F# also has partial application(and currying). So you don't need to use a closure to do this.
Zerot commented on We fine-tuned Llama 405B on AMD GPUs   publish.obsidian.md/felaf... · Posted by u/felarof
system2 · a year ago
Why is obsidian (a note-taking app) doing this?
Zerot · a year ago
They aren't. This company is using obsidian publish to publish documents.

u/Zerot

KarmaCake day13September 24, 2024View Original