I keep wondering why. All projects I ever saw need lines of code, nuts and bolts removed instead of added. My best libraries consist of a couple of thousand lines.
Of course, there are many many other kinds of development - when developing novel low-level systems for complicated requirements, you're going to get much poorer results from an LLM, because the project won't as neatly fit in to one of the "templates" that it has memorized, and the LLM's reasoning capabilities are not yet sophisticated enough to handle arbitrary novelty.
Why not just Chrome/Firefox/Safari to open the link instead of the Instagram app?
My understanding is that the former (sucking up) is a personality trait, substantially influenced by the desire to facilitate engagement. The latter (making up facts), I do not think is correct to ascribe to a personality trait (like compulsive liar); instead, it is because the fitness function of LLMs drive them to produce some answer and they do not know what they're talking about, but produce strings of text based on statistics.
In this situation very often there won't be _any_ answer, plenty of difficult questions go unanswered on the internet. Yet the model probably does not interpret this scenario as such
The good ol' 7 out of 10 problem. Where the entire lower half of the rating system is unused.
The way I solve it is to have 0 be the middle, If the transaction exceeded expectations you can give it a +1 if it failed to meet them give it a -1 (thumbs up/down if you prefer) you can even throw in a +-2 if you want to allow for more subtlety of expression