max_on_hn (u/max_on_hn)

max_on_hn commented on Peasant Railgun knightsdigest.com/what-ex... · Posted by u/cainxinth

hooverd · 2 months ago

Did you use ChatGPT/an LLM for this comment or do you just write Like That?

max_on_hn · 2 months ago

ChatGPT was sticky for me very early because its writing style reminded me of my own ¯\_(ツ)_/¯

max_on_hn commented on Remote MCP Support in Claude Code anthropic.com/news/claude... · Posted by u/surprisetalk

nvahalik · 2 months ago

Used this Friday to have Claude do some stuff but telling it to read a Linear ticket and make appropriate changes. Not perfect but saved me 15 minutes.

max_on_hn · 2 months ago

If you like that workflow you might love the tool[0] which I built specifically to support it: CheepCode connects to Linear and works on tickets as they roll in, submitting PRs to GitHub.

[0] https://cheepcode.com

max_on_hn commented on Prompt engineering playbook for programmers addyo.substack.com/p/the-... · Posted by u/vinhnx

max_on_hn · 3 months ago

Some solid tips here but I think this bit really misses the point:

> The key is to view the AI as a partner you can coach – progress over perfection on the first try

This is not how to use AI. You cannot scale the ladder of abstraction if you are babysitting a task at one rung.

If you feel that it’s not possible yet, that may be a sign that your test environment is immature. If it is possible to write acceptance tests for your project, then trying to manually coach the AI is just a cost optimization, you are simply reducing the tokens it takes the AI to get the answer. Whether that’s worth your time depends on the problem, but in general if you are manually coaching your AI you should stop and either:

1. Work on your pipeline for prompt generation. If you write down any relevant project context in a few docs, an AI will happily generate your prompts for you, including examples and nice formatting etc. Getting better at this will actually improve

2. Set up an end-to-end test command (unit/integration tests are fine too add later but less important than e2e)

These processes are how people use headless agents like CheepCode[0] to move faster. Generate prompts with AI and put them in a task management app like Linear, then CheepCode works on the ticket and makes a PR. No more watching a robot work, check the results at the end and only read the thoughts if you need to debug your prompt.

[0] the one I built - https://cheepcode.com

max_on_hn commented on Cursor 1.0 cursor.com/en/changelog/1... · Posted by u/ecz

davedx · 3 months ago

To try and give examples: an autonomous agent that can integrate with github, read issues, then make pull requests against those issues is a step (or maybe two) above an LLM API (cucumber seller).

It doesn't take much more of a stretch to imagine teams of agents, coordinated by a "programme manager" agent, with "QA agents" working to defined quality metrics, "architect" agents that take initial requirements and break them down into system designs and github issues, and of course the super important "product owner" agent who talks to actual humans and writes initial requirements. Such a "software team system" would be another abstraction level above individual agents like Codex.

max_on_hn · 3 months ago

This exactly. I built CheepCode to do the first part already, so it can accept tasks through Linear etc and submit PRs in GitHub. It already tests its work headlessly (including with Playwright if it’s web code), and I am almost done with the QA agent :-)

max_on_hn commented on Mary Meeker's first Trends report since 2019, focused on AI bondcap.com/reports/tai... · Posted by u/kjhughes

CompoundEyes · 3 months ago

I’d like to hear more discussion of AI being applied in a ways that are “good enough”. So much focus on it having to be 100% or it sucks. There are use cases where it provides a lot of value and doesn’t have to be perfect to replace tasks done by an imperfect employee who sometimes misses details too. Audit the output with a human like Taco Bell (Yum) is doing in AI drive through orders. Are most the day to day questions the a person asks so critical in nature that hallucinations cause any more issues than bad advice from a person or an inaccurate Wikipedia or news article or mishearing? Tolerance of correctness proportional to the importance of the task i guess. I wouldn’t publish government health policy citing hallucinated research or devise tariff algorithms but I’m cool with my generative pumpkin bars recipe accidentally having a tbsp tsp error I’d notice in making them.

max_on_hn · 3 months ago

I think we see this a lot with software development AI; the tab complete only has to be “good enough” to be worth tweaking. Often “good enough” first pass from the AI is a few motions on the keyboard away from shippable.

Now with headless agents (like CheepCode[0], the one I built) that connect directly to the same task management apps that we do as human programmers, you can get “good enough” PRs out of a single Linear ticket with no need to touch an IDE. For copy changes and other easy-to-verify tweaks this saves developers a lot of overhead checking out branches, making PRs, etc so they can stay focused on the more interesting/valuable work. At $1/task a “good enough” result is well worth it compared to the cost of human time.

[0] https://cheepcode.com

max_on_hn commented on Human coders are still better than LLMs antirez.com/news/153... · Posted by u/longwave

dinfinity · 3 months ago

This. MCP/tool usage in agentic mode is insanely powerful. Let the agent ingest a Gitlab issue, tell it how it can run commands, tests etc. in the local environment and half of the time it can just iterate towards a solution all by itself (but watching and intervening when it starts going the wrong way is still advisable).

Recently I converted all the (Google Docs) documentation of a project to markdown files and added those to the workspace. It now indexes it with RAG and can easily find relevant bits of documentation, especially in agent mode.

It really stresses the importance of getting your documentation and processes in order as well as making sure the tasks at hand are well-specified. It soon might be the main thing that requires human input or action.

max_on_hn · 3 months ago

I 100% agree that documenting requirements will be the main human input to software development in the near future.

In fact, I built an entirely headless coding agent for that reason: you put tasks in, you get PRs out, and you get journals of each run for debugging but it discourages micro-management so you stay in planning/documenting/architecting.

max_on_hn commented on Launch HN: Relace (YC W23) – Models for fast and reliable codegen · Posted by u/eborgnia

max_on_hn · 3 months ago

I will have to try out Relace for CheepCode[0], my cloud-based AI coding agent :) Right now I’m using something I hacked together, but this looks quite slick!

[0] https://cheepcode.com

max_on_hn commented on Ask HN: Building LLM apps? How are you handling user context? · Posted by u/marcospassos

max_on_hn · 3 months ago

I don't know of anything off-the-shelf, but you could query analytics tools at runtime (e.g. Mixpanel, PostHog) to gather the raw data, and use a generic summarizer to turn that into behavioral context that's usable downstream.

max_on_hn commented on Ask HN: Anyone struggling to get value out of coding LLMs? · Posted by u/bjackman

gyomu · 3 months ago

There are two kinds of engineers.

Those who can’t stop raving about how much of a superpower LLMs are for coding, how it’s made them 100x more productive, and is unlocking things they could’ve never done before.

And those who, like you, find it to be an extremely finicky process that requires extreme amount of coddling to get average results at best.

The only thing I don’t understand is why people from the former group aren’t all utterly dominating the market and obliterating their competitors with their revolutionary products and blazing fast iteration speed.

max_on_hn · 3 months ago

(disclaimer: I have a vested interest in the space as the purveyor of an AI software development agent)

The result you described is coming soon. CheepCode[0] agents already produce working code in a satisfying percentage of cases, and I am at most 3 months away from it producing end-to-end apps and complex changes that are at least human-quality. It would take way less if I got funded to work on it full time.

Given that I'm this close as a solo founder with no employees, you can imagine what's cooking inside large companies.

[0] My product, cloud-based headless coding agents that connect directly to Linear, accept tickets, and submit GitHub PRs