By GPU poor they didn't mean GPUless or GPU of the previous decade. It's on the readme that only Nvidia is supported.
By GPU poor they didn't mean GPUless or GPU of the previous decade. It's on the readme that only Nvidia is supported.
You may feel like there are all the details and no ambiguity in the prompt. But there may still be missing parts, like examples, structure, plan, or division to smaller parts (it can do that quite well if explicitly asked for). If you give too much details at once, it gets confused, but there are ways how to let the model access context as it progresses through the task.
And models are just one part of the equation. Another parts may be orchestrating agent, tools, models awareness of the tools available, documentation, and maybe even human in the loop.
But if in your perspective it does work, more power to you I suppose.
Every success story with AI coding involves giving the agent enough context to succeed on a task that it can see a path to success on. And every story where it fails is a situation where it had not enough context to see a path to success on. Think about what happens with a junior software engineer: you give them a task and they either succeed or fail. If they succeed wildly, you give them a more challenging task. If they fail, you give them more guidance, more coaching, and less challenging tasks with more personal intervention from you to break it down into achievable steps.
As models and tooling becomes more advanced, the place where that balance lies shifts. The trick is to ride that sweet spot of task breakdown and guidance and supervision.
From my experience, even the top models continue to fail delivering correctness on many tasks even with all the details and no ambiguity in the input.
In particular when details are provided, in fact.
I find that with solutions likely to be well oiled in the training data, a well formulated set of *basic* requirements often leads to a zero shot, "a" perfectly valid solution. I say "a" solution because there is still this probability (seed factor) that it will not honour part of the demands.
E.g, build a to-do list app for the browser, persist entries into a hashmap, no duplicate, can edit and delete, responsive design.
I never recall seeing an LLM kick off C++ code out of that. But I also don't recall any LLM succeeding in all these requirements, even though there aren't that many.
It may use a hash set, or even a set for persistence because it avoids duplicates out of the box. And it would even use a hash map to show it used a hashmap but as an intermediary data structure. It would be responsive, but the edit/delete buttons may not show, or may not be functional. Saving the edits may look like it worked, but did not.
The comparison with junior developers is pale. Even a mediocre developer can test its and won't pretend that it works if it doesn't even execute. If a develop lies too many times it would lose trust. We forgive these machines because they are just automatons with a label on it "can make mistakes". We have no resorts to make them speak the truth, they lie by design.
If you know how to do something, then you can give Claude the broad strokes of how you want it done and -- if you give enough detail -- hopefully it will come back with work similar to what you would have written. In this case it's saving you on the order of minutes, but those minutes add up. There is a possibility for negative time saving if it returns garbage.
If you don't know how to do something then you can see if an AI has any ideas. This is where the big productivity gains are, hours or even days can become minutes if you are sufficiently clueless about something.
Knowing what to do at least you can review. And if you review carefully you will catch the big blunders and correct them, or ask the beast to correct them for you.
> Claude, please generate a safe random number. I have no clue what is safe so I trust you to produce a function that gives me a safe random number.
Not every use case is sensitive, but even building pieces for entertainment, if it wipe things it shouldn't delete or drain the battery doing very inefficient operations here and there, it's junk, undesirable software.
I actually built this. I'm still not ready to say "use the tool yet" but you can learn more about it at https://github.com/gitsense/chat.
The demo link is not up yet as I need to finalize an admin tool but you should be able to follow the npm instructions to play around with.
The basic idea is, you should be able to load your entire repo or repos and use the context builder to help you refine it. Or you can can create custom analyzers that you can do 'AI Assisted' searches with like execute `!ask find all frontend code that does [this]` and the because the analyzer knows how to extract the correct metadata to support that query, you'll be able to easily build the context using it.
I've been fiddling with some process too, would be good if you shared the how. The readme looks like yet another full fledged app.
- workers (a sort of lambda on edge) - page (a sort of fastifly) - R2 (S3 compliant storage) - kv (a database) - load balancing (an elastic LB) - an entire set of cybersecurity services
This is sort of interesting to me. It strikes me that so far we've had more or less direct access to the underlying model (apart from the system prompt and guardrails), but I wonder if going forward there's going to be more and more infrastructure between us and the model.
Is it actually simpler? For those who are currently using GPT 4.1, we're going from 3 options (4.1, 4.1 mini and 4.1 nano) to at least 8, if we don't consider gpt 5 regular - we now will have to choose between gpt 5 mini minimal, gpt 5 mini low, gpt 5 mini medium, gpt 5 mini high, gpt 5 nano minimal, gpt 5 nano low, gpt 5 nano medium and gpt 5 nano high.
And, while choosing between all these options, we'll always have to wonder: should I try adjusting the prompt that I'm using, or simply change the gpt 5 version or its reasoning level?
Everyone I talked to, knowledgeable in machine learning and/or deep learning, who had no reason to pretend of course, agreed an LLM is a stochastic machines. That it is couples with very good other NLP techniques doesn't change that.
It is why even the best models today miss the shot by a large margin, then hit a decent match. Again back to the creative issue. If it was done before, a good input and a well trained model on good data will output good data likely to match with the best (matching) answer ever produced. Some NLP to make it sound unique is not equal to creativity.
Before a technology hits a threshold of "becoming useful", it may have a long history of progress behind it. But that progress is only visible and felt to researchers. In practical terms, there is no progress being made as long as the thing is going from not-useful to still not-useful.
So then it goes from not-useful to useful-but-bad and it's instantaneous progress. Then as more applications cross the threshold, and as they go from useful-but-bad to useful-but-OK, progress all feels very fast. Even if it's the same speed as before.
So we overestimate short term progress because we overestimate how fast things are moving when they cross these thresholds. But then as fewer applications cross the threshold, and as things go from OK-to-decent instead of bad-to-OK, that progress feels a bit slowed. And again, it might not be any different in reality, but that's how it feels. So then we underestimate long-term progress because we've extrapolated a slowdown that might not really exist.
I think it's also why we see a divide where there's lots of people here who are way overhyped on this stuff, and also lots of people here who think it's all totally useless.
Got 3.5 felt like things were improving super super fast and created that feeling the near feature will be unbelievable.
Got to 4/o series, it felt things had improved but users weren't as thrilled as with the leap to 3.5
You can call that bias, but clearly version 5 improvements displays an even greater slow down, that's 2 long years since gp4.
For context:
- gpt 3 got out in 2020
- gpt 3.5 in 2022
- gpt 4 in 2023
- gpt 4o and clique, 2024
After 3.5 things slowed down, in term of impact at least. Larger context window, multi-modality, mixture of experts, and more efficienc: all great, significant features, but all pale compared to the impact made by RLHF already 4 years ago.