Readit News logoReadit News
bgwalter · 6 months ago
> What other tasks could be automated today with the current LLMs performance?

CEO speeches and pro-LLM blogs come to mind.

Again, there is a vague focus on "updating dependencies" where allegedly some time was saved. Take that to the extreme and we don't need any new software. Freeze Linux and Windows, do only security updates and fire everyone. Because the ultimate goal of LLM shills or self-hating programmers appears to be to eliminate all redundant work.

Be careful what you wish for. They won't reward you for shilling or automating, they'll just fire you.

msgodel · 6 months ago
The primary use seems to be satisfying administrative demands that were never productive anyway.
Eddy_Viscosity2 · 6 months ago
This. They've been pushing these at my workplace and the only thing I can think to use it for is have the LLMs generate empty long-winded corporate-speak emails that I can send to managers when they ask for things that seem best answered by an empty long-winded corporate-speak email. Like "How are using using all these AI tools we are forcing on you without asking if you needed or wanted them?"
GardenLetter27 · 6 months ago
This feels a bit too optimistic, in practice it often gets stuck going down a rabbit hole (and burning up your requests / tokens doing it!).

Like even when I tested it on a clean assessment (albeit with Cursor in this case) - https://jamesmcm.github.io/blog/claude-data-engineer/ - it did very well in agent mode, but the questions it got wrong were worrying because they're the sort of things that a human might not notice either.

That said I do think you could get a lot more accuracy between the agent checking and running its own answers, and then also sending its diff to a very strong LLM like o3 or Gemini Pro 2.5 to review it - it's just a bit expensive to do that atm.

The main issue on real projects is that just having enough context to even approach problems, and build and run tests is very difficult when you have 100k+ lines of code and it takes 15 minutes to clean build and run tests. And it feels like we're still years away from having all of the above, plus a large enough context window that this is a non-issue, for a reasonable price.

cyanydeez · 6 months ago
Like, its a nerd slot machine: shows you small wins, gets you almost big wins and seduces you into thinking "just one more perfect prompt and surely ill hit the jackpot"
vital_beach · 6 months ago
I really enjoyed Claude Code. I was using it on some side projects for about a month with API credits, and I signed up for the Max subscription shortly after it started working with Code. Overnight, my account was banned, and I have no idea why.

It sucks getting banned from such a cool and helpful tool :(

tbcj · 6 months ago
I had two accounts banned - one for Claude and one for the API. I tried to appeal both asking for more information. The response from Anthropic was non-specific and only that it violates usage. One account had only been minimally used. One never used. The accounts used email addresses using a domain I control - e.g., anthropic-claude@domain.xyz for example. I think that might have something to do with it.

I have a new account now using a Google account and it hasn’t been banned.

bn-l · 6 months ago
Did the program need to kill child processes a lot?
vital_beach · 6 months ago
nope, just running and stopping dev servers. It may have done a pfkill once or twice if something was hanging?

Either way, using it was the API credits was fine for a little over a month, so I don't know if it was that. I got autobanned only a few hours after paying for Max and reauthing the client to use the subscription. My actual usage of it didn't change.

stpedgwdgfhgdd · 6 months ago
The recent developments are impressive. I’m now using my IDE as a diff viewer. Everything goes through the terminal. If there is an error, CC can analyse and fix it.

Still needs a lot of handholding. I do not (yet) think big upfront plans will suddenly start working in the enterprise world. Let it write a failing test first.

_dark_matter_ · 6 months ago
I'm still not convinced. I spent a few hours today trying to get it to add linting to a SQL repository, _given another repository that already had what I wanted_.

At one point it got a linting error and just added that error to the ignore list. I definitely I spent more time reviewing this code and prompting than it would have taken for me to do it myself. And it's still not merged!

ed_mercer · 6 months ago
Are you saying TDD works best with CC? Write a failing test first? I read an article about that recently but can't find it...

EDIT: https://www.anthropic.com/engineering/claude-code-best-pract...

revskill · 6 months ago
LLM reflects YOUR intelligence, it's the secret truth.
rvnx · 6 months ago
Many of the complainers don't know how to use them and how to write prompts, and then blame the LLMs.

Or simply use LLMs that struggle at writing good code (GPT, Gemini Pro, etc).

You need to be in the shoes of a product owner, and be able to express your requirements clearly and drive the LLM in your direction, and this requires to learn new skills (like kids learn how to use search engines).

timr · 6 months ago
> Or simply use LLMs that struggle at writing good code (GPT, Gemini Pro, etc).

I love how one side of this debate seems to have embraced "No True Scotsman" as the preferred argument strategy. Anyone who points out that these things have practical limitations gets a litany of "oh you aren't using it right" or "oh, you just aren't using the cool model" in response. It reminds me of the hipsters in SF who always felt your music was a little too last week.

As someone who is currently using these every day, Gemini Pro is right up there with the very best models for writing code -- and "GPT" is not a single thing -- so I have no idea what you're talking about. These things have practical limitations.

WorldMaker · 6 months ago
> Or simply use LLMs that struggle at writing good code (GPT

As still the default for GitHub Copilot GPT doesn't seem to "struggle" at all with writing good code. Anecdotally, in comparison with GPT, Claude seems woefully under-trained in areas such as PowerShell and cross-platform solutions compared to GPT. (Which also seems to show directly in Claude Code's awful cross-platform support. If Claude is so good why doesn't it fix Claude Code's Windows support? Add more PowerShell support instead of just bashing out bash-isms?)

A lot of impressions of the LLMs are hugely subjective, and I'm inclined to the above poster's suggestion a lot of of what you get out of an LLM is a reflection of who you are and what you put in to the LLM. (They are massively optimized GIGO machines after all.)

thi2 · 6 months ago
Would you mind sharing good and bad examples of prompts? I always read comments like yours and miss examples.
goodpoint · 6 months ago
If anything it reflects the intelligence of the people whose work is being stolen.
rvnx · 6 months ago
Ycombinator is an accomplice of this, and you know, all they will get is billions of tainted money as punishment. But I guess they can live with that.
cainxinth · 6 months ago
Just like all the people who think their LLM is sentient or an alien or a god are really just talking to themselves.
guluarte · 6 months ago
in my experience using agents has just wasted my time and money, they are good for small things if you are lazy and watching a movie looking at the results every 10 minutes, reverting and trying again
arpowers · 6 months ago
Has anyone actually gotten productivity improvements from Claude Code?

What’s the use case?

(I tried some things, and it blew up. Thus far my experience w agents in general)

ryandvm · 6 months ago
I have used it on a fairly simple Kotlin Android application and was blown away. I have previously been using paid ChatGPT, Github Copilot, and Gemini. In my opinion, it's the complete access to your repo that really makes it powerful, whereas with the other plugins you kind of have to manually feed it the files in your workspace and keep them in sync.

I asked it to add Google Play subscription support to my application and it did, it required minimal tweaking.

I asked it to add a screen for requesting location permissions from the user and it did it perfectly. No adjustment.

I also asked it add a query parameter to my API (GoLang) which should result in a subtle change several layers deep and it had no problems with that.

None of this is rocket science and I think the key is that it's all been done and documented a million times on the Internet. At this point, Claude Code is at least as effective as junior developer.

Yes, I understand that this is a Faustian bargain.

jki275 · 6 months ago
FYI -- Windsurf, Cline, Cursor will all do this also, using Claude models if you set them up that way.
anonzzzies · 6 months ago
It gives us great productivity. If you write the tests yourself and insist it delivers 100% success without touching the tests themselves, just run them, it is very nice. We wrote a little bit of tooling around it so it instructs and loops until 100% succeed. Even for stuff that's complex enough for seniors to struggle (parsers/compilers), it delivers results after hours instead of days or weeks. But if you miss some tests you can all but guarantee that those things won't work even though an experienced human would automatically do that right as it is illogical for instance. But we would write tests like this for humans as well, so there is not much difference in our workflow; CC delivers faster and far far cheaper. And we tried it all, especially NOT having it integrated into an ide is brilliant. Before we used aider instead of cursor etc as we can control it: we don't want a human sitting there tapping 'yes, please do' or whatnot. We want it to finish, commit a PR and then review.
octo888 · 6 months ago
It's great at mocking up some HTML pages with eg Tailwind and static site generators. Give it some ideas, a bit of copy, a few colours and it'll create some pages filled with plausible sounding text. I can imagine using it in front of clients to give them an idea of what a new site could look like.

Easily adjusted with things like "the colour palette is a bit bright, use more pastels" or "make it more SEO friendly" and it often easily generates a large todo list/set of changes based on minimal input

My friend was mulling over a product concept and I used it to design a landing page and it helped her see how easily you can create a website to sell the product. It took ~15 minutes and I'm a web dev noob. (Obviously setting up a real ecommerce site is a little bit more work)

It makes sense it's good at HTML because of the huge body of public data available.

Deleted Comment

atlgator · 6 months ago
I've been very successful pointing it to a backlog of manual test cases, using Playwright MCP to execute the test cases against dev as a black box, and generating the corresponding Playwright scripts to add to our automated test repo.

I had hired an actual automated tester with years of experience to write playwright scripts for us. After 3 months he had not produced a single passing test. I managed to build the entire scaffolding myself in 2 weeks having no prior playwright experience.

memorylane · 6 months ago
I use CC in existing code bases to build out new GUI - VueJS/Quasar and it blows me away! For back end Rust code it excels at boilerplate crud handlers back to the db - it copies the style of existing code… I’ll happily pay for it if my boss does not, just work less hours…
datpuz · 6 months ago
The productivity gains decrease with user experience. A high-performing senior engineer won't get a lot, but I think they've reached a point now where even seniors will benefit a fair amount. For me it's not really that they increase my productivity directly, but they let me offload a lot of the cognitive load. I'm getting a similar amount of work done and I don't feel as drained at the end of the day.
aussieguy1234 · 6 months ago
Usually, I'll go through my coding like I would have pre-LLMs.

Then, when I see something that looks like it can be reliably automated by an AI agent, I'll open up Cline and put Claude or Gemini Flash to work. This has a 90% success rate so far and has saved me hours of work.