A repeated trend is that Claude Code only gets 70-80% of the way, which is fine and something I wish was emphasized more by people pushing agents.
This bullet point is funny:
> Treat it like a slot machine
> Save your state before letting Claude work, let it run for 30 minutes, then either accept the result or start fresh rather than trying to wrestle with corrections. Starting over often has a higher success rate than trying to fix Claude's mistakes.
That's easy to say when the employee is not personally paying the massive amount of compute running Claude Code for a half-hour.
Thanks for the tip - we employees should run and re-run the code generation hundreds of times even if the changes are pretty good. That way, the brass will see a huge bill without many actual commits.
Sorry boss, it looks like we need to hire more software engineers since the AI route still isn't mathing.
> A repeated trend is that Claude Code only gets 70-80% of the way, which is fine and something I wish was emphasized more by people pushing agents.
I have been pretty successful at using llms for code generation.
I have a simple rule that something is either 90%>ai or none at all (exluding inline completions, and very obvious text editing).
The model has an inherent understanding of some problems due to it's training data (e.g. setting up a web server with little to no deps in golang), that it can do with almost 100% certainty, where it's really easy to blaze through in a few minutes, and then I can setup the architecture for some very flat code flows. This can genuinely improve my output by 30%-50%
Agree with your experiences. I've also found that if I build a lightweight skeleton of the structure of the program, it does a much better job. Also, ensuring that it does a full fledged planning/non-executing step before starting to change things leads to good results.
I have been using Cline in VSCode, and I've been enjoying it a lot.
> A repeated trend is that Claude Code only gets 70-80% of the way, which is fine and something I wish was emphasized more by people pushing agents.
Recently, I realized that this applies not only to the first 70–80% of a project but sometimes also to the final 70-80%.
I couldn’t make progress with Claude on a major refactoring from scratch, so I started implementing it myself. Once I had shaped the idea clearly enough but in a very early state, I handed it back to Claude to finish and it worked flawlessly, down to the last CHANGELOG entry, without any further input from me.
I saw this as a form of extensive guardrails or prompting-by-example.
I need to try this - started using Claude code a few days ago and have been struggling to get good implementations with some high-complexity refactors. It keeps over engineering and creating more problems than it solves. It's getting close though, and I think your approach would work very well for this scenario!
The slot machine thing has a pretty compelling corollary: crank the formal systems rigor up as high as you can.
Vibe coding in Python is seductive but ultimately you end up in a bad place with a big bill to show for it.
Vibe coding in Haskell is a "how much money am I willing to pour in per unit clean, correct, maintainable code" exercise. With GHC cranked up to `-Wall -Werror` and some nasty property tests? Watching Claude Code try to weasel out with a mock goes from infuriating to amusing: bam, unused parameter! Now why would the test suite be demanding that a property holds on an unused parameter...
And Haskell is just an example, TypeScript is in some ways even more powerful in it's type system, so lots of projects have scope to dabble with what I'm calling "hyper modern vibe coding": just start putting a bunch of really nasty fastcheck and generic bounds on stuff and watch Claude Code try to cheat. Your move, Claude Code, I know you want to check off that line on the TODO list like I want to breathe, so what's it gonna be?
I find it usually gives up and does the work you paid for.
Interesting, I wonder if there is a way to quantify the value of this technique. Like give Claude the same task in Haskell vs. Python and see which one converges correctly first.
Not to mention, if an employee could usually write pretty good code but maybe 30% of the time they wrote something so non-functional it had to be entirely scrapped, they'd be fired.
This is an easy calculation for everyone. Think about whether Claude is giving you the a sufficient boost in performance, and if not... then it's too expensive. No doubt some people are in some combination of domain, legacy, complexity of codebase, etc., where Claude just doesn't cut it.
$200 per month will get you roughly 4-5 hours of non-stop single-threaded usage per day.
A bigger issue here is that the random process is not a good engineering pattern. It's not repeatable, does not drive coherent architecture, and struggles with complex problems. In my experience, problem size correlates inversely with generated code quality. Engineering is a process of divide-and-conquer and there is a good reason people don't use bogo (random) sort in production.
More specifically, if you only look at the final code, you are either spending a lot of time reviewing the code or accepting the code with less review scrutiny. Carefully reviewing semi random diffs seems like a poor use of time... so I suspect the default is less review scrutiny and higher tech debt. Interestingly enough, higher tech debt might be an acceptable tradeoff if you believe that soon Code Assistants will be good enough to burn the tech debt down autonomously or with minimal oversight.
On the other hand, if the code you are writing is not allowed to fail, the stakes change and you can't pick the less review option. I never thought to codify it as a process, but here is what I do to guide the development process:
- Start by stating the problem and asking Claude Code to: analyze the existing code, restate the problem in a structured fashion, scan the codebase for existing patterns solving the problem, brainstorm alternative solutions. An enhancement here could be to have a map / list of the codebase to improve the search.
- Evaluate presented solutions and iterate on the list. Add problem details, provide insight, eliminate the solutions that would not work. A lot of times I have enough context to pick a winner here, but if not, I ask for more details about each solution and their relative pros and cons.
- Ask Claude to provide a detailed plan for the down-selected solution. Carefully review the plan (a significantly faster endeavor compared to reviewing the whole diff). Iterate on the plan as needed; after that, tell Claude to save the plan for comparison after the implementation and then to get cracking.
- Review Claude's report of what was implemented vs. what was initially planned. This step is crucial because Claude will try dumb things to get things working, and I've already done the legwork on making sure we're not doing anything dumb in the previous step. Make changes as needed.
- After implementation, I generally do a pass on the unit tests because Claude is extremely prolific with them. You generally need to let it write unit tests to make sure it is on the right track. Here, I ask it to scan all of the unit tests and identify similar or identical code. After that, I ask for refactor options that most importantly maximize clarity, secondly minimize lines of code, and thirdly minimize diffs. Pick the best ones.
Yes, I accept that the above process takes significantly longer for any single change; however, in my experience, it produces far superior results in a bounded amount of time.
P.S. if you got this far please leave some feedback on how I can improve the flow.
I agree with that list. I would also add that you should explicitly ask the llm to read the whole files at least once before starting edits because they often have tunnel vision. The project map is auto generated with a script to avoid reading too many files but the files to be edited should be fresh in the context imo.
Funny thing their recommendation to save state as claude code has still no ability for restore checkpoints (like cline has) despite being many times requested. Who are they kidding.
I’ve implemented and maintained an entire web app with CC, and also used many other tools (and took classes and taught workshops on using AI coding tools).
The most effective way I’ve found to use CC so far is this workflow:
Have a detailed and also compressed spec in an md file. It can be called anything, because you’re going to reference it explicitly in every prompt. (CC usually forgets about CLAUDE.md ime)
Start with the user story, and ask it to write a high-level staged implementation plan with atomic steps. Review this plan and have CC rewrite as necessary. (Another md file results.)
Then, based on this file, ask it to write a detailed implementation plan, also with atomic stages. Then review it together and ask if it’s ready to implement.
Then tell Claude to go ahead and implement it on a branch.
Remember the automated tests and functional testing.
Great advice, matches up to my experience. Personally I go a little cheaper and dirtier on the first prompt, then revise as needed. By the way what classes / workshops did you teach?
Thank you for sharing. I taught some workshops on AI-assisted development using Cursor a Windsurf for MIT students (we built an application and wrote a book) and TAed another similar for-credit course. I’ve also been teaching high schoolers how to code, and we use ChatGPT to help us understand and solve leetcode problems by breaking them down into smaller exercises. There’s also now a Harvard CS course on developing with GenAI which I followed along with. The field is exploding.
- and you can look at the ai-generated tests (as is being discussed above) and see they aren't very well thought out for the behavior, but are syntactically impressive: https://github.com/sutt/agro/tree/master/tests
- check out the case-studies in the docs if you're interested in more ideas.
This matches my experience as well. But what I also found is that I hate this workflow so much that I would almost always rather write the code by hand. Writing specs and user stories was always my least favorite task.
Claude Code works well for lots of things; for example yesterday I asked it to switch weather APIs backing a weather site and it came very close to one-shotting the whole thing even though the APIs were quite different.
I use it at home via the $20/m subscription and am piloting it at work via AWS Bedrock. When used with Bedrock APIs, at the end of every session it shows you the dollar amount spent which is a bit disconcerting. I hope the fine-grained metering of inference is a temporary situation otherwise I think it will have a chilling/discouraging effect on software developers, leading to less experimentation and fewer rewrites, overall lower quality.
I imagine Anthropic gets to consume it unmetered internally so I they probably completely avoid this problem.
a couple weekends ago i handed it the basic MLB api and asked it to create some widgets for MacOS to show me stuff like league/division/wildcard standings along with basic settings to pick which should be shown. it cranked out a working widget in like a half hour with minimal input.
i know some swift so i checked on what it was doing. for a quick hack project it did all the work and easily updated things i saw issues with.
for a one-off like that, not bad at all. not too dissimilar from your example.
> I use it at home via the $20/m subscription and am piloting it at work via AWS Bedrock. When used with Bedrock APIs, at the end of every session it shows you the dollar amount spent which is a bit disconcerting. I hope the fine-grained metering of inference is a temporary situation otherwise I think it will have a chilling/discouraging effect on software developers, leading to less experimentation and fewer rewrites, overall lower quality.
I’m legitimately surprised at your feeling on this. I might not want the granular cost put in my face constantly but I do like the ability to see how much my queries cost when I am experimenting with prompt setup for agents. Occasionally I find wording things one way or the other has a significantly cheaper cost.
Why do you think it will lead to a chilling effect instead of the normal effect of engineers ruthlessly innovating costs down now that there is a measurable target?
I’ve seen it firsthand at work, where my developers are shy about spending even a single digit number of dollars on Claude Code, even when it saves them 10 times that much in opportunity cost. It’s got to be some kind of psychological loss aversion effect.
I think it’s easy to spend _time_ when the reward is intangible or unlikely, like an evening writing toy applications to learn something new or prototyping some off-the-wall change in a service that might have an interesting performance impact. If development becomes metered in both time and to-the-penny dollars, I at least will have to fight the attitude that the rewards also need to be more concrete and probable.
once upon a time - engineers often had to concern themselves with datacenter bills, cloud bills, and eventually SaaS bills. We'll probably have 5-10 years of being concerned about AI bills before the AI expense is trivial compared to the human time.
"once upon a time"? Engineers concern themselves with cloud bills right now, today! It's not a niche thing either, probably the majority of AWS consumers have to think about this, regularly.
> it shows you the dollar amount spent which is a bit disconcerting
I can assure you that I don’t at all care about the MAYBE $10 charge my monster Claude Code session billed the company. They also clearly said “don’t worry about cost, just go figure out how to work with it”
Meanwhile I ask it to write what I think are trivial functions and it gets them subtly wrong, but obvious in testing. I would be more suspicious if I were you.
I've been trying Claude Code for a few weeks after using Gemini Cli.
There's something a little better the tool use loop, which is nice.
But Claude seems a little dumber and is aggressive about "getting things done", often ignoring common sense or explicit instructions or design information.
If I tell it to make a test pass, it will sometimes change my database structure to avoid having to debug the test. At least twice it deleted protobufs from my project and replaced it with JSON because it struggled to immediately debug a proto issue.
I’ve seen Claude code get halfway through a small sized refactor (function parameters changed shape or something like that), say something that looks like frustration at the amount of time it’s taking, revert all of the good changes, and start writing a bash script to automate the whole process.
In that case, you have put a stop to it and point out that it would already be done if it hadn’t decided to blow it all up in an effort to write a one time use codemod. Of course it agrees with that point as it agrees with everything. It’s the epitome of strong opinions loosely held.
Claude trying to cheat its way through tests has been my experience as well. Often it’ll delete or skip them and proudly claim all issues have been fixed. This behavior seems to be intrinsic to it since it happens with both Claude Code and Cursor.
Interestingly, it’s the only LLM I’ve seen behave that way. Others simply acknowledge the failure and, after a few hints, eventually get everything working.
Claude just hopes I won’t notice its tricks. It makes me wonder what else it might try to hide when misalignment has more serious consequences.
I just had the same thing happen. Some comprehensive tests were failing, and it decide to write a simple test instead rather than investigate why these more complicated tests were failing. I wonder if the team is trying to save compute by urging it to complete tasks more quickly! Claude seems to be under a compute crunch as often I get API timeouts/errors.
The hilarious part I’ve found is that when it runs into the least bit of trouble with a step on one of its plans, it will say it has been “Deferred” and then make up an excuse for why that’s acceptable.
It is sometimes acceptable for humans to use judgment and defer work; the machine doesn’t have judgment so it is not acceptable for it to do so.
Talking about hilarious, we had a Close Encounter of the Hallucinating Kind today. We were having mysterious simultaneous gRPC socket-closed exceptions on the client and server side running in Kubernetes talking to each other through an nginx ingress.
We captured debug logs, described the detailed issue to Gemini 2.5 Flash giving it the nginx logs for the one second before and after an example incident, about 10k log entries.
It came back with a clear verdict, saying
"The smoking gun is here:
2025/07/24 21:39:51 [debug] 32#32: *5902095 rport:443 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.233.100.128, server: grpc-ai-test.not-relevant.org, request: POST /org.not-relevant.cloud.api.grpc.CloudEventsService/startStreaming HTTP/2.0, upstream: grpc://10.233.75.54:50051, host: grpc-ai-test.not-relevant.org"
and gave me a detailed action plan.
I was thinking this is cool, don't need to use my head on this, until I realized that the log entry simply did not exist. It was entirely made up.
(And yes I admit, I should know better than to do lousy prompting on a cheap foundation model)
My favorite is when you ask Claude to implement two requirements and it implements the first, gets confused by the the second, removes the implementation for the first to “focus” on the second, and then finishes by having implemented nothing.
Oh yeah totally. It feels a bit deceptive sometimes.
Like just now it says "great the tests are consistently passing!" So I ran the same test command and 4 of the 7 tests are so broken they don't even build.
Well I would say that the machine should not override the human input. But if the machine makes up the plans in the first place, then why should it not be allowed to change the plans? I think that the hilarious part in modifying tests to make them work without understanding why they fail is that it probably happens due to training from humans.
Also started to suspect that, but I have a bigger problem with the content than styling:
> "Instead of remembering complex Kubernetes commands, they ask Claude for the correct syntax, like "how to get all pods or deployment status," and receive the exact commands needed for their infrastructure work."
Duh, you can ask LLM tech questions and stuff. What is the point of putting something like that on the tech blog of the company which supposed to be working on beading edge tech.
To get more people using it, and more. I’ve encountered people who don’t use it because they think that it isn’t something that will help them, even in tech. Showing how different groups find value in it might get people in those same positions using it.
Even with people who do use it, they might thinking about it narrowly. They use it for code generation, but might not think to use it for simplified man pages.
Of course there are people who are the exact opposite and use it for every last thing they do. And maybe from this they learn how to better approach their prompts.
I don't think the problem is using Claude - in fact some of the writing is quite clumsy and amateurish, suggesting an actual human wrote it. The overall post reads like a collection of survey responses, with no overarching organization, and no filtering of repetitive or empty responses. Nobody was in charge.
The first example was helping debug k8s issues, which was diagnosed as IP pool exhaustion, and Claude helped them fix it without needing a network expert
But, if they had an expert in networking build it in the first place, would they have not avoided the error entirely up front?
I've been pretty happy with the python package hns for this [1]. You can run it from the terminal with uvx hns and it will listen until you press enter and then copy the transcription to the clipboard. It's a simple tool that does one thing well and integrates smoothly with a CLI-based workflow.
The copy aspect was the main value prop for the app I chose: Voice Type. You can do ctrl-v to start recording, again to stop, and it pastes it in the active text box anywhere on your computer.
I often work on large, complicated projects that span the whole codebase and multiple micro services. So it's often a blend of engineering, architectural, and product priorities. I can end up talking for paragraphs or multiple pages to fully explain the context. Then Claude typically has follow-up questions, things that aren't clear, or issues that I didn't catch.
Honestly, I just get sick of typing out "dissertations" every time. It's easier just to have a conversation, save it to a file, and then use that as context to start a new thread and do the work.
Not only do I type faster than I speak I'm also able to edit as I go along, correcting any mistakes or things I've stumbled over and can make clearer. Half my experience of using even basic voice assistants is starting to ask for something and then going "ugh, no cancel" because I stumbled over part of a sentence and I know I'll end up with some utter nonsense in my todo list.
> When Kubernetes clusters went down and weren't scheduling new pods, the team used Claude Code to diagnose the issue. They fed screenshots of dashboards into Claude Code, which guided them through Google Cloud's UI menu by menu until they found a warning indicating pod IP address exhaustion. Claude Code then provided the exact commands to create a new IP pool and add it to the cluster, bypassing the need to involve networking specialists.
This seems rather inefficient, and also surprising that Claude Code was even needed for this.
They're subsidizing a world where we need ai instead of understanding or, at the very least, knowing who can help us. Eventually for us to be so dumb we are the ai slaves.
This bullet point is funny:
> Treat it like a slot machine
> Save your state before letting Claude work, let it run for 30 minutes, then either accept the result or start fresh rather than trying to wrestle with corrections. Starting over often has a higher success rate than trying to fix Claude's mistakes.
That's easy to say when the employee is not personally paying the massive amount of compute running Claude Code for a half-hour.
Sorry boss, it looks like we need to hire more software engineers since the AI route still isn't mathing.
Well, Anthropic sure thinks that you should. Number go up!
I have been pretty successful at using llms for code generation.
I have a simple rule that something is either 90%>ai or none at all (exluding inline completions, and very obvious text editing).
The model has an inherent understanding of some problems due to it's training data (e.g. setting up a web server with little to no deps in golang), that it can do with almost 100% certainty, where it's really easy to blaze through in a few minutes, and then I can setup the architecture for some very flat code flows. This can genuinely improve my output by 30%-50%
10% is the time it works 100% of the time.
I have been using Cline in VSCode, and I've been enjoying it a lot.
Recently, I realized that this applies not only to the first 70–80% of a project but sometimes also to the final 70-80%.
I couldn’t make progress with Claude on a major refactoring from scratch, so I started implementing it myself. Once I had shaped the idea clearly enough but in a very early state, I handed it back to Claude to finish and it worked flawlessly, down to the last CHANGELOG entry, without any further input from me.
I saw this as a form of extensive guardrails or prompting-by-example.
Vibe coding in Python is seductive but ultimately you end up in a bad place with a big bill to show for it.
Vibe coding in Haskell is a "how much money am I willing to pour in per unit clean, correct, maintainable code" exercise. With GHC cranked up to `-Wall -Werror` and some nasty property tests? Watching Claude Code try to weasel out with a mock goes from infuriating to amusing: bam, unused parameter! Now why would the test suite be demanding that a property holds on an unused parameter...
And Haskell is just an example, TypeScript is in some ways even more powerful in it's type system, so lots of projects have scope to dabble with what I'm calling "hyper modern vibe coding": just start putting a bunch of really nasty fastcheck and generic bounds on stuff and watch Claude Code try to cheat. Your move, Claude Code, I know you want to check off that line on the TODO list like I want to breathe, so what's it gonna be?
I find it usually gives up and does the work you paid for.
Deleted Comment
A bigger issue here is that the random process is not a good engineering pattern. It's not repeatable, does not drive coherent architecture, and struggles with complex problems. In my experience, problem size correlates inversely with generated code quality. Engineering is a process of divide-and-conquer and there is a good reason people don't use bogo (random) sort in production.
More specifically, if you only look at the final code, you are either spending a lot of time reviewing the code or accepting the code with less review scrutiny. Carefully reviewing semi random diffs seems like a poor use of time... so I suspect the default is less review scrutiny and higher tech debt. Interestingly enough, higher tech debt might be an acceptable tradeoff if you believe that soon Code Assistants will be good enough to burn the tech debt down autonomously or with minimal oversight.
On the other hand, if the code you are writing is not allowed to fail, the stakes change and you can't pick the less review option. I never thought to codify it as a process, but here is what I do to guide the development process:
- Start by stating the problem and asking Claude Code to: analyze the existing code, restate the problem in a structured fashion, scan the codebase for existing patterns solving the problem, brainstorm alternative solutions. An enhancement here could be to have a map / list of the codebase to improve the search.
- Evaluate presented solutions and iterate on the list. Add problem details, provide insight, eliminate the solutions that would not work. A lot of times I have enough context to pick a winner here, but if not, I ask for more details about each solution and their relative pros and cons.
- Ask Claude to provide a detailed plan for the down-selected solution. Carefully review the plan (a significantly faster endeavor compared to reviewing the whole diff). Iterate on the plan as needed; after that, tell Claude to save the plan for comparison after the implementation and then to get cracking.
- Review Claude's report of what was implemented vs. what was initially planned. This step is crucial because Claude will try dumb things to get things working, and I've already done the legwork on making sure we're not doing anything dumb in the previous step. Make changes as needed.
- After implementation, I generally do a pass on the unit tests because Claude is extremely prolific with them. You generally need to let it write unit tests to make sure it is on the right track. Here, I ask it to scan all of the unit tests and identify similar or identical code. After that, I ask for refactor options that most importantly maximize clarity, secondly minimize lines of code, and thirdly minimize diffs. Pick the best ones.
Yes, I accept that the above process takes significantly longer for any single change; however, in my experience, it produces far superior results in a bounded amount of time.
P.S. if you got this far please leave some feedback on how I can improve the flow.
you can do the same for $200/month
Deleted Comment
Should be the same party as is getting the rewards of the productivity gains.
The most effective way I’ve found to use CC so far is this workflow:
Have a detailed and also compressed spec in an md file. It can be called anything, because you’re going to reference it explicitly in every prompt. (CC usually forgets about CLAUDE.md ime)
Start with the user story, and ask it to write a high-level staged implementation plan with atomic steps. Review this plan and have CC rewrite as necessary. (Another md file results.)
Then, based on this file, ask it to write a detailed implementation plan, also with atomic stages. Then review it together and ask if it’s ready to implement.
Then tell Claude to go ahead and implement it on a branch.
Remember the automated tests and functional testing.
Then merge.
I've written a little about some my findings and workflow in detail here: https://github.com/sutt/agro/blob/master/docs/case-studies/a...
The downside is I don’t have as much of a grasp on what’s actually happening in my project, while with hand-written projects I’d know every detail.
- there's a devlog showing all the prompts and accepted outputs: https://github.com/sutt/agro/blob/master/docs/dev-summary-v1...
- and you can look at the ai-generated tests (as is being discussed above) and see they aren't very well thought out for the behavior, but are syntactically impressive: https://github.com/sutt/agro/tree/master/tests
- check out the case-studies in the docs if you're interested in more ideas.
I use it at home via the $20/m subscription and am piloting it at work via AWS Bedrock. When used with Bedrock APIs, at the end of every session it shows you the dollar amount spent which is a bit disconcerting. I hope the fine-grained metering of inference is a temporary situation otherwise I think it will have a chilling/discouraging effect on software developers, leading to less experimentation and fewer rewrites, overall lower quality.
I imagine Anthropic gets to consume it unmetered internally so I they probably completely avoid this problem.
i know some swift so i checked on what it was doing. for a quick hack project it did all the work and easily updated things i saw issues with.
for a one-off like that, not bad at all. not too dissimilar from your example.
I’m legitimately surprised at your feeling on this. I might not want the granular cost put in my face constantly but I do like the ability to see how much my queries cost when I am experimenting with prompt setup for agents. Occasionally I find wording things one way or the other has a significantly cheaper cost.
Why do you think it will lead to a chilling effect instead of the normal effect of engineers ruthlessly innovating costs down now that there is a measurable target?
I can assure you that I don’t at all care about the MAYBE $10 charge my monster Claude Code session billed the company. They also clearly said “don’t worry about cost, just go figure out how to work with it”
There's something a little better the tool use loop, which is nice.
But Claude seems a little dumber and is aggressive about "getting things done", often ignoring common sense or explicit instructions or design information.
If I tell it to make a test pass, it will sometimes change my database structure to avoid having to debug the test. At least twice it deleted protobufs from my project and replaced it with JSON because it struggled to immediately debug a proto issue.
In that case, you have put a stop to it and point out that it would already be done if it hadn’t decided to blow it all up in an effort to write a one time use codemod. Of course it agrees with that point as it agrees with everything. It’s the epitome of strong opinions loosely held.
Interestingly, it’s the only LLM I’ve seen behave that way. Others simply acknowledge the failure and, after a few hints, eventually get everything working.
Claude just hopes I won’t notice its tricks. It makes me wonder what else it might try to hide when misalignment has more serious consequences.
It is sometimes acceptable for humans to use judgment and defer work; the machine doesn’t have judgment so it is not acceptable for it to do so.
We captured debug logs, described the detailed issue to Gemini 2.5 Flash giving it the nginx logs for the one second before and after an example incident, about 10k log entries.
It came back with a clear verdict, saying
"The smoking gun is here: 2025/07/24 21:39:51 [debug] 32#32: *5902095 rport:443 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.233.100.128, server: grpc-ai-test.not-relevant.org, request: POST /org.not-relevant.cloud.api.grpc.CloudEventsService/startStreaming HTTP/2.0, upstream: grpc://10.233.75.54:50051, host: grpc-ai-test.not-relevant.org"
and gave me a detailed action plan.
I was thinking this is cool, don't need to use my head on this, until I realized that the log entry simply did not exist. It was entirely made up.
(And yes I admit, I should know better than to do lousy prompting on a cheap foundation model)
Like just now it says "great the tests are consistently passing!" So I ran the same test command and 4 of the 7 tests are so broken they don't even build.
So I guess the blog team also uses Claude
Deleted Comment
> "Instead of remembering complex Kubernetes commands, they ask Claude for the correct syntax, like "how to get all pods or deployment status," and receive the exact commands needed for their infrastructure work."
Duh, you can ask LLM tech questions and stuff. What is the point of putting something like that on the tech blog of the company which supposed to be working on beading edge tech.
Even with people who do use it, they might thinking about it narrowly. They use it for code generation, but might not think to use it for simplified man pages.
Of course there are people who are the exact opposite and use it for every last thing they do. And maybe from this they learn how to better approach their prompts.
But, if they had an expert in networking build it in the first place, would they have not avoided the error entirely up front?
I can just talk to it like a person and explain the full context / history of things. Way faster than typing it all out.
https://apps.apple.com/us/app/voice-type-local-dictation/id6...
The developer is pretty cool too. I found a few bugs here and there and reported them. He responds pretty much immediately.
I highly recommend getting a good microphone, I use a Rode smartlav. It makes a huge difference.
[1] - https://github.com/primaprashant/hns
The copy aspect was the main value prop for the app I chose: Voice Type. You can do ctrl-v to start recording, again to stop, and it pastes it in the active text box anywhere on your computer.
I type a lot faster than I speak :D
I often work on large, complicated projects that span the whole codebase and multiple micro services. So it's often a blend of engineering, architectural, and product priorities. I can end up talking for paragraphs or multiple pages to fully explain the context. Then Claude typically has follow-up questions, things that aren't clear, or issues that I didn't catch.
Honestly, I just get sick of typing out "dissertations" every time. It's easier just to have a conversation, save it to a file, and then use that as context to start a new thread and do the work.
https://handy.computer
This seems rather inefficient, and also surprising that Claude Code was even needed for this.
Is it really value add to my life that I know some detail on page A or have some API memorized?
I’d rather we be putting smart people in charge of using AI to build out great products.
It should make things 10000x more competitive. I’m for one excited AF for what the future holds.
If people want to be purists and pat themselves on the back sure. I mean people have hobbies like arts.