martinald (u/martinald)

martinald commented on AGI is an engineering problem, not a model training problem vincirufus.com/posts/agi-... · Posted by u/vincirufus

xyzzy123 · 20 hours ago

Am I the only one who feels that Claude Code is what they would have imagined basic AGI to be like 10 years ago?

It can plan and take actions towards arbitrary goals in a wide variety of mostly text-based domains. It can maintain basic "memory" in text files. It's not smart enough to work on a long time horizon yet, it's not embodied, and it has big gaps in understanding.

But this is basically what I would have expected v1 to look like.

martinald · 17 hours ago

Totally agree. It even (usually) gets subtle meanings from my often hastily written prompts to fix something.

What really occurs to me is that there is still so much can be done to leverage LLMs with tooling. Just small things in Claude Code (plan mode for example) make the system work so much better than (eg) the update from Sonnet 3.5 to 4.0 in my eyes.

martinald commented on Mark Zuckerberg freezes AI hiring amid bubble fears telegraph.co.uk/business/... · Posted by u/pera

tracker1 · 3 days ago

At what cost though? Most AI operations are losing money, using a lot of power, including massive infrastructure costs, not to mention the hardware costs to get going, and that isn't even covering the level of usage many/most want, and certainly aren't going to pay even $100s/month per person that it currently costs to operate.

martinald · 3 days ago

This is a really basic way to look at unit economics of inference.

I did some napkin math on this.

32x H100s cost 'retail' rental prices about $2/hr. I would hope that the big AI companies get it cheaper than this at their scale.

These 32 H100s can probably do something on the order of >40,000 tok/s on a frontier scale model (~700B params) with proper batching. Potentially a lot more (I'd love to know if someone has some thoughts on this).

So that's $64/hr or just under $50k/month.

40k tok/s is a lot of usage, at least for non-agentic use cases. There is no way you are losing money on paid chatgpt users at $20/month on these.

You'd still break even supporting ~200 Claude Code-esque agentic users who were using it at full tilt 40% of the day at $200/month.

Now - this doesn't include training costs or staff costs, but on a pure 'opex' basis I don't think inference is anywhere near as unprofitable as people make out.

martinald commented on I gave Claude Code a folder of tax documents and used it as a tax agent martinalderson.com/posts/... · Posted by u/martinald

martinald · 4 days ago

Author here. I was genuinely surprised this worked as well as it did - regular LLMs really struggle with UK tax law but giving Claude Code access to 10,000 documents changed everything.

Curious if anyone else has tried using Claude Code for non-development tasks? The pattern seems like it could work for any domain with lots of reference documentation.

martinald commented on GPTs and Feeling Left Behind whynothugo.nl/journal/202... · Posted by u/Bogdanp

jama211 · 14 days ago

No, you’re being paid to deliver the product to the _company’s_ chosen standards, not yours. And in my experience, fast and cheap and cheerful is often exactly what they want. They’ll have changed their minds next week and want it all ripped out for something else anyway.

martinald · 14 days ago

Exactly. _So much_ software dev is "throwaway" in my experience. Of course some isn't. Landing pages, A/B tests, even a lot of feature work is very speculative and gets put in the trash.

I do wonder if this is why there is such a gulf in LLM experience. If you're doing hardcore huge scale distributed systems then I can (maybe?) see why you'd think it is useless. However, that is very niche imo and most software dev work is some level (unfortunately) of virtually throwaway code. Of course, not all is - of all the ideas and experiments, some percentage is hopefully very successful and can be polished.

martinald commented on GPTs and Feeling Left Behind whynothugo.nl/journal/202... · Posted by u/Bogdanp

socalgal2 · 14 days ago

And the tests? To me, if it was something important enough to have tests for in which case it's important to check the tests are valid so I'd have to review them

martinald · 14 days ago

It takes me 5 mins to review e2e UI tests and watch it go through them. It's literally filling fields and making sure certain things happen.

martinald commented on GPTs and Feeling Left Behind whynothugo.nl/journal/202... · Posted by u/Bogdanp

kaashif · 15 days ago

Yeah, as a primarily backend engineer dealing with either weird technical problems Claude can't get quite right or esoteric business domain problems Claude has no idea about (and indeed, it may be only a few people in one company could help with) - Claude isn't that useful.

But random stuff like make a web app that automates this thing or make an admin panel with auto complete on these fields and caching data pulled from this table.

It is like infinity times faster on this tedious boilerplate because some of this stuff I'd just have never done before.

Or I'd have needed to get some headcount in some web dev team to do it, but I just don't need to. Not that I'd have ever actually bothered to do that anyway...

martinald · 15 days ago

One thing I'd recommend for weird business domain projects as a starter is getting it to create a "wiki" of markdown files of all the logic (I suspect this may have been on your to-do list anyway!). You may be pleasantly surprised at how well it does it, and then you can update your claude.md file to point to them (or even put it in there, but it is maybe overkill).

martinald commented on GPTs and Feeling Left Behind whynothugo.nl/journal/202... · Posted by u/Bogdanp

martinald · 15 days ago

I'm completely equally lost the other way.

I've went through multiple phases of LLM usage for development.

GPT3.5 era: wow this is amazing, oh. everything is hallucinated. not actually as useful as I first thought

GPT4 era: very helpful as stackoverflow on steroids.

Claude 3.5 Sonnet: have it open pretty much all the time, constantly asking questions and getting it to generate simple code (in the web UI) when it goes down actually feels very old school googling stuff. Tried a lot of in IDE AI "chat" stuff but hugely underwhelmed.

Now: rarely open IDE as I can do (nearly) absolutely everything in Claude Code. I do have to refactor stuff every so often "manually", but this is more for my sanity and understanding of the codebase..

To give an example of a task I got Claude code to do today in a few minutes which would take me hours. Had a janky looking old admin panel in bootstrap styles that I wanted to make look nice. Told Claude code to fetch the marketing site for the project. Got it to pull CSS, logos, fonts from there using curl and apply similar styling to the admin panel project. Within 10 mins it was looking far, far better than I would have ever got it looking (at least without a designers help). Then got it to go through the entire project (dozens of screens) and update "explanation" copy - most of which was TODO placeholders to explain what everything did properly. I then got it to add an e2e test suite to the core flows.

This took less than an hour while I was watching TV. I would have almost certainly _never_ got around to this before. I'd been meaning to do all this and I always sigh when I go into this panel at how clunky it all is and hard to explain to people.