Readit News logoReadit News
anotherpaulg · a year ago
> 99% of the code in this PR [for llama.cpp] is written by DeekSeek-R1

It's definitely possible for AI to do a large fraction of your coding, and for it to contribute significantly to "improving itself". As an example, aider currently writes about 70% of the new code in each of its releases.

I automatically track and share this stat as graph [0] with aider's release notes.

Before Sonnet, most releases were less than 20% AI generated code. With Sonnet, that jumped to >50%. For the last few months, about 70% of the new code in each release is written by aider. The record is 82%.

Folks often ask which models I use to code aider, so I automatically publish those stats too [1]. I've been shifting more and more of my coding from Sonnet to DeepSeek V3 in recent weeks. I've been experimenting with R1, but the recent API outages have made that difficult.

[0] https://aider.chat/HISTORY.html

[1] https://aider.chat/docs/faq.html#what-llms-do-you-use-to-bui...

joshstrange · a year ago
First off I want to thank you for Aider. I’ve had so much fun playing with it and using it for real work. It’s an amazing tool.

How do you determine how much was written by you vs the LLM? I assume it consists of parsing the git log and getting LoC from that or similar?

If the scripts are public could you point me at them? I’d love to run it on a recent project I did using aider.

anotherpaulg · a year ago
Glad to hear you’re finding aider useful!

There’s a faq entry about how these stats are computed [0]. Basically using git blame, since aider is tightly integrated with git.

The faq links to the script that computes the stats. It’s not designed to be used on any repo, but you (or aider) could adapt it.

You’re not the first to ask for these stats about your own repo, so I may generalize it at some point.

[0] https://aider.chat/docs/faq.html#how-are-the-aider-wrote-xx-...

fsndz · a year ago
I think the secret of DeepSeek is basically using RL to train a model that will generate high quality synthetic data. You then use the synthetic dataset to fine-tune a pretrained model and the result is just amazing: https://open.substack.com/pub/transitions/p/the-laymans-intr...
yoyohello13 · a year ago
Maybe this is answered, but I didn't see it. How does aider deal with secrets in a git repo? Like if I have passwords in a `.env`?

Edit: I think I see. It only adds files you specify.

FeepingCreature · a year ago
Aider has a command to add files to the prompt. For files that are not added, it uses tree-sitter to extract a high-level summary. So for a `.env`, it will mention to the LLM the fact that the file exists, but not what is in it. If the model thinks it needs to see that file, it can request it, at which point you receive a prompt asking whether it's okay to make that file available.

It's a very slick workflow.

anotherpaulg · a year ago
You can use an .aiderignore file to ensure aider doesn't use certain files/dirs/etc. It conforms to the .gitignore spec.

Dead Comment

almostgotcaught · a year ago
> 99% of the code in this PR [for llama.cpp] is written by DeekSeek-R1

you're assuming the PR will land:

> Small thing to note here, for this q6_K_q8_K, it is very difficult to get the correct result. To make it works, I asked deepseek to invent a new approach without giving it prior examples. That's why the structure of this function is different from the rest.

This certainly wouldn't fly in my org (even with test coverage/passes).

Jimmc414 · a year ago
>> Small thing to note here, for this q6_K_q8_K, it is very difficult to get the correct result. To make it works, I asked deepseek to invent a new approach without giving it prior examples. That's why the structure of this function is different from the rest.

> This certainly wouldn't fly in my org (even with test coverage/passes).

To be fair, this seems expected. A distilled model might struggle more with aggressive quantization (like q6) since you're stacking two forms of quality loss: the distillation loss and the quantization loss. I think the answer would be to just use the higher cost full precision model.

Philpax · a year ago
llama.cpp optimises for hackability, not necessarily maintainability or cleanliness. You can look around the repository to get a feel for what I mean.
brianstrimp · a year ago
> It's definitely possible for AI to do a large fraction of your coding, and for it to contribute significantly to "improving itself". As an example, aider currently writes about 70% of the new code in each of its releases.

That number itself is not saying much.

Let's say I have an academic article written in Word (yeah, I hear some fields do it like that). I get feedback, change 5 sentences, save the file. Then 20k of the new file differ from the old file. But the change I did was only 30 words, so maybe 200 bytes. Does that mean that Word wrote 99% of that update? Hardly.

Or in C: I write a few functions in which my old-school IDE did the indentation and automatic insertion of closing curly braces. Would I say that the IDE wrote part of the code?

Of course the AI supplied code is more than my two examples, but claiming that some tool wrote 70% "of the code" suggests a linear utility of the code which is just not representing reality very well.

anotherpaulg · a year ago
Every metric has limitations, but git blame line counts seem pretty uncontroversial.

Typical aider changes are not like autocompleting braces or reformatting code. You tell aider what to do in natural language, like a pair programmer. It then modifies one or more files to accomplish that task.

Here's a recent small aider commit, for flavor.

  -# load these from aider/resources/model-settings.yml
  -# use the proper packaging way to locate that file
  -# ai!
  +import importlib.resources
  +
  +# Load model settings from package resource
  MODEL_SETTINGS = []
  +with importlib.resources.open_text("aider.resources", "model-settings.yml") as f:
  +    model_settings_list = yaml.safe_load(f)
  +    for model_settings_dict in model_settings_list:
  +        MODEL_SETTINGS.append(ModelSettings(**model_settings_dict))
  
https://github.com/Aider-AI/aider/commit/5095a9e1c3f82303f0b...

stavros · a year ago
That's pretty reaching though if you're comparing an AI to a formatter. Presumably 70% of a new Aider release isn't formatting.
simonw · a year ago
"The stats are computed by doing something like git blame on the repo, and counting up who wrote all the new lines of code in each release. Only lines in source code files are counted, not documentation or prompt files."
reitzensteinm · a year ago
R1 is available on both together.ai and fireworks.ai, it should be a drop in replacement using the OpenAI API.
SkyPuncher · a year ago
The problem is it's very expensive. More expensive than Claude.
htrp · a year ago
Run your deepseek R1 model on your own hardware.
girvo · a year ago
Only various distillations are available for most people’s hardware, and they’re quite obviously not as good as actual R1 in my testing.
hammock · a year ago
That's amazing data. How representative do you think your Aider data is of all coding done?
maeil · a year ago
> I've been shifting more and more of my coding from Sonnet to DeepSeek V3 in recent weeks.

For what purpose, considering Sonnet 3.5 still outperforms V3 on your own benchmarks (which also tracks with my personal experience comparing them)?

carpo · a year ago
aider looks amazing - I'm going to give it a try soon. Just had a question on API costs to see if i can afford it. Your FAQ says you used about 850k tokens for Claude, and their API pricing says output tokens are $15/MTok. Does that mean it cost you under $15 for your Claude 3.5 usage or am I totally off-base? (Sorry if this is has an obvious answer ... I don't know much about LLM API pricing.)
simonw · a year ago
I built a calculator for that here: https://tools.simonwillison.net/llm-prices

It says that for 850,000 Claude 3.5 output tokens the cost would be $12.75.

But... it's not 100% clear from me if the Aider FAQ numbers are for input or output tokens.

anotherpaulg · a year ago
When I was mostly just using Sonnet I was spending ~$100/month on their API. That included some amount of bulk API use for benchmarking, not just my interactive AI coding.
jsnell · a year ago
If you're concerned about API costs, the experimental Gemini models with API keys from API studio tend to have very generous free quota. The quality of e.g. Flash 2.0 Experimental is definitely good enough to try out Aider and see if the workflow clicks. (For me, the quality has been good enough that I just stuck with it, and didn't get around to experimenting with any of the paid models yet.)
nprateem · a year ago
> As an example, aider currently writes about 70% of the new code in each of its releases.

Yeah but part of that is because it's physically impossible to stop it making random edits for the sake of it.

Imanari · a year ago
Love aider, thank you for your work! Out of curiousity, what are your future plans and ideas for aider in terms of features and workflow?
realo · a year ago
Hello...

Is it possible to use aider with a local model running in LMStudio (or ollama)?

From a quick glance i did not see an obvious way to do that...

Hopefully i am totally wrong!

anotherpaulg · a year ago
Thanks for your interest in aider.

Yes, absolutely you can work with local models. Here are the docs for working with lmstudio and ollama:

https://aider.chat/docs/llms/lm-studio.html

https://aider.chat/docs/llms/ollama.html

leetharris · a year ago
Yes absolutely

In the left bar there's a "connecting to LLMs" section

Check out ollama as an example

m3kw9 · a year ago
Yes and is easy
fragmede · a year ago
yeah:

    aider --model ollama_chat/deepseek-r1:32b
(or whatever)

rahimnathwani · a year ago
When a log line contains {main_model, weak_model, editor_model} does the existence of main_model mean that mean the person was using Aider in Architect/Editor mode?

Do you usually use that mode and, if so, with which architect?

Thank you!

wvlia5 · a year ago
Can you make a plot like HISTORY but with axis changed? X: date Y: work leverage (i.e. 50%=2x, 90%=10x, 95%=20x, leverage = 1/(1-pct) )
aledalgrande · a year ago
Could you share how you track AI vs human LoC?
simonw · a year ago
That's covered here, including a link to the script: https://aider.chat/docs/faq.html#how-are-the-aider-wrote-xx-...
simonw · a year ago
Given these initial results, I'm now experimenting with running DeepSeek-R1-Distill-Qwen-32B for some coding tasks on my laptop via Ollama - their version of that needs about 20GB of RAM on my M2. https://www.ollama.com/library/deepseek-r1:32b

It's impressive!

I'm finding myself running it against a few hundred lines of code mainly to read its chain of thought - it's good for things like refactoring where it will think through everything that needs to be updated.

Even if the code it writes has mistakes, the thinking helps spot bits of the code I may have otherwise forgotten to look at.

lacedeconstruct · a year ago
The chain of thought is incredibly useful, I almost dont care about the answer now I just follow what I think is interesting from the way it broke the problem down, I tend to get tunnel vision when working for a long time on something so its a great way to revise my work and make sure I am not misunderstanding something
rtsil · a year ago
Yesterday, I had it think for 194 seconds. At some point near the end, it said "This is getting frustrating!"
miohtama · a year ago
Also even if the answer is incorrect, you can still cook the eggs on the laptop :)
lawlessone · a year ago
i spent a months salary on these eggs and can no longer afford to cook them :(
belter · a year ago
The eggs cost more than the laptop...
brandall10 · a year ago
If you have a bit more memory, use the 6 bit quant, takes up about 26gb and has been shown to be very minimally lossy as opposed to 4bit.

Also serve it as MLX from LMStudio, will speed things up 30% or so so your 6bit will have similar perf to the 4bit.

Getting about 12-13 tok/sec on my M3 Max 48gb.

thomasskis · a year ago
EXO is also great for running the 6bit deepseek, plus it’s super handy to serve from all your devices simultaneously. If your dev team all has M3 Max 48gb machines, sharing the compute lets you all run bigger models and your tools can point at your local API endpoint to keep configs simple.

Our enterprise internal IT has a low friction way to request a Mac Studio (192GB) for our team and it’s a wonderful central EXO endpoint. (Life saver when we’re generally GPU poor)

matwood · a year ago
Can you link to the model you’re talking about? I can’t find the exact one using your description. Thanks!
mike31fr · a year ago
Noob question (I only learned how to use ollama a few days ago): what is the easiest way to run this DeepSeek-R1-Distill-Qwen-32B model that is not listed on ollama (or any other non-listed model) on my computer ?
codingdave · a year ago
If you are specifically running it for coding, I'm satisfied with using it via continue.dev in VS Code. You can download a bunch of models with ollama, configure them into continue, and then there is a drop-down to switch models. I find myself swapping to smaller models for syntax reminders, and larger models for beefier questions.

I only use it for chatting about the code - while this setup also lets the AI edit your code, I don't find the code good enough to risk it. I get more value from reading the thought process, evaluating it, and the cherry picking which bits of its code I really want.

In any case, if that sounds like the experience you want and you already run ollama, you would just need to install the continue.dev VS Code extension, and then go to its settings to configure which models you want in the drop-down.

rahimnathwani · a year ago
This model is listed on ollama. The 20GB one is this one: https://ollama.com/library/deepseek-r1:32b-qwen-distill-q4_K...
simonw · a year ago
Search for a GGUF on Hugging Face and look for a "use this model" menu, then click the Ollama option and it should give you something to copy and paste that looks like this:

  ollama run hf.co/MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF:IQ1_M

nyrikki · a year ago

   ollama run deepseek-r1:32b

They dropped the Qwen/Llama terms from the string

https://ollama.com/library/deepseek-r1

marpstar · a year ago
I'm using it inside of LM Studio (https://lmstudio.ai), which has a "Discovery" tab where you can download models.
blakesterz · a year ago
Is DeepSeek really that big of a deal that everyone else should worry?
m11a · a year ago
A lot of the niceness about DeepSeek-R1's usage in coding is that you can see the thought process, which (IME) has been more useful than the final answer.

It may well be that o1's chain of thought reasoning trace is also quite good. But they hide it as a trade secret and supposedly ban users for trying to access it, so it's hard to know.

horsawlarway · a year ago
I would say worry? Yes. Panic? No.

It's... good. Even the qwen/llama distills are good. I've been running the Llama-70b-distill and it's good enough that it mostly replaces my chatgpt plus plan (not pro - plus).

I think if anything - One of my big takeaways is that OpenAI shot themselves in the foot, big time, by not exposing the COT for the O1 Pro models. I find the <think></think> section of the DeepSeek models to often be more helpful than the actual answer.

For work that's treating the AI as collaborative rather than "employee replacement" the COT output is really valuable. It was a bad move for them to completely hide it from users, especially because they make the user sit there waiting while it generates anyways.

pavitheran · a year ago
Deepseek is a big deal but we should be happy not worried that our tools are improving.
flmontpetit · a year ago
As far as realizing the prophecy of AI as told by its proponents and investors goes, probably not. LLMs still have not magically transcended their obvious limitations.

However this has huge implications when it comes to the feasibility and spread of the technology, and further implications with regards to economy and geopolitics now that confidence in the American AI sector has been hit and people and organizations internationally have somewhere else to look for.

edit: That being said, this is the first time I've seen a LLM do a better job than even a senior expert could do, and even if it's on small scope/in a limited context, it's becoming clear that developers are going to have to adopt this tech in order to stay competitive.

buyucu · a year ago
There are two things. First, deepseek v3 and r1 are both amazing models.

Second, the fact that deepseek was able to pull this off with such modest resources is an indication that there is no moat, and you might wake up tomorrow and find an even better model from a company you have never heard of.

simonw · a year ago
Yeah, it is definitely a big deal.

I expect it will be a net positive: they proved that you can both train and run inference against powerful models for way less compute than people had previously expected - and they published enough details that other AI labs are already starting to replicate their results.

I think this will mean cheaper, faster, and better models.

This FAQ about it is very good: https://stratechery.com/2025/deepseek-faq/

GaggiX · a year ago
DeepSeek R1 is o1 but free to use, open source, and also distilled on different models, even the ones that could run on your phone so yeah.
llm_trw · a year ago
End result is on par with o1 preview, which is ironically more intelligent than o1, but the intermediate tokens are actually useful. I've got it running locally last night and out of 50 questions so far I've gotten the answer in the chain of thought in more than half.
steeeeeve · a year ago
Today it is. Tomorrow everyone will look at it like Wish or Temu.
coliveira · a year ago
It depends on the problem type. If your problem requires math reasoning, deepSeek response is quite impressive and surpasses what most people can do in a single session.
csomar · a year ago
Everyone else should rejoice. OpenAI is probably cooked, however. Nvidia might be cooked too.
whitehexagon · a year ago
Agreed, I switched from qwq now to the same model. I'm running it under ollama on a M1 Asahi Linux and it seems maybe twice the speed (not very scientific but not sure how to time the token generation), and more, dare I say smarter? than qwq, and maybe a tad less RAM. It still over ponders, but not as bad as some of the pages and pages of, 'that looks wrong, maybe I should try...' circles with qwq, but which was already so impressive.

I'm quite new to this, how are you feeding in so much text? just copy/paste? I'd love to be able to run some of my Zig code through it, but I haven't managed to get Zig running under Asahi so far.

buyucu · a year ago
DeepSeek-R1-Distill-Qwen-32B is my new default model on my home server. previously it was aya-32b.
xenospn · a year ago
What do you use it at home for?
m3kw9 · a year ago
What does distil qwen 32b mean? It uses qwen for what?
buyucu · a year ago
deepseek fine-tuned qwen32b with data generated by deepseek671b
amarcheschi · a year ago
For what i can understand, he asked deepseek to convert arm simd code to wasm code.

in the github issue he links he gives an example of a prompt: Your task is to convert a given C++ ARM NEON SIMD to WASM SIMD. Here is an example of another function: (follows a block example and a block with the instructions to convert)

https://gist.github.com/ngxson/307140d24d80748bd683b396ba13b...

I might be wrong of course, but asking to optimize code is something that quite helped me when i first started learning pytorch. I feel like "99% of this code blabla" is useful as in it lets you understand that it was ai written, but it shouldn't be a brag. then again i know nothing about simd instructions but i don't see why it should be different for a capable llm to do simd instructions or optimized high level code (which is much harder than just working high level code, i'm glad i can do the latter lol)

thorum · a year ago
Yes, “take this clever code written by a smart human and convert it for WASM” is certainly less impressive than “write clever code from scratch” (and reassuring if you’re worried about losing your job to this thing).

That said, translating good code to another language or environment is extremely useful. There’s a lot of low hanging fruit where there’s, for example, an existing high quality library is written for Python or C# or something, and an LLM can automatically convert it to optimized Rust / TypeScript / your language of choice.

HanClinto · a year ago
Keep in mind, two of the functions were translated, and the third was created from scratch. Quoting from the FAQ on the Gist [1]:

Q: "It only does conversion ARM NEON --> WASM SIMD, or it can invent new WASM SIMD code from scratch?"

A: "It can do both. For qX_0 I asked it to convert, and for qX_K I asked it to invent new code."

* [1]: https://gist.github.com/ngxson/307140d24d80748bd683b396ba13b...

th0ma5 · a year ago
Porting well written code if you know the target language well is pretty fun and fast in my experience. Often when there are library, API, or language feature differences, these are better considered outside of most work it would take to fully describe the entire context to a model is what has happened in my experience, however.
freshtake · a year ago
This. For folks who regularly write simd/vmx/etc, this is a fairly straightforward PR, and one that uses very common patterns to achieve better parallelism.

It's still cool nonetheless, but not a particularly great test of DeepSeek vs. alternatives.

gauge_field · a year ago
That is what I am struggling to understand about the hype. I regularly use them to generate new simd. Other than a few edge cases (issues around handling of nan values, order of argument for corresponding ops, availability of new avx512f intrinsics), they are pretty good at converting. The names of very intrinsics are very similar from simd to another. The very self-explanatory nature of the intrinsics names and having similar apis from simd to another makes this somewhat expected result given what they can already accomplish.
csomar · a year ago
Deepseek r1 is not exactly better than the alternatives. It is, however, open as in open weight and requires much less resources. This is what’s disruptive about it.
softwaredoug · a year ago
LLMs are great at converting code, I've taken functions whole cloth and converted them before and been really impressed
CharlesW · a year ago
For those who aren't tempted to click through, the buried lede for this (and why I'm glad it's being linked to again today) is that "99% of the code in this PR [for llama.cpp] is written by DeekSeek-R1" as conducted by Xuan-Son Nguyen.

That seems like a notable milestone.

drysine · a year ago
>99% of the code in this PR [for llama.cpp] is written by DeekSeek-R1

Yes, but:

"For the qX_K it's more complicated, I would say most of the time I need to re-prompt it 4 to 8 more times.

The most difficult was q6_K, the code never works until I ask it to only optimize one specific part, while leaving the rest intact (so it does not mess up everything)" [0]

And also there:

"You must start your code with #elif defined(__wasm_simd128__)

To think about it, you need to take into account both the refenrence code from ARM NEON and AVX implementation."

[0] https://gist.github.com/ngxson/307140d24d80748bd683b396ba13b...

janwas · a year ago
Interesting that both de-novo and porting seems to have worked.

I do not understand why GGML is written this way, though. So much duplication, one variant per instruction set. Our Gemma.cpp only requires a single backend written using Highway's portable intrinsics, and last I checked for decode on SKX+Zen4, is also faster.

aithrowawaycomm · a year ago
Reading through the PR makes me glad I got off GitHub - not for anything AI-related, but because it has become a social media platform, where what should be a focused and technical discussion gets derailed by strangers waging the same flame wars you can find anywhere else.
skeaker · a year ago
This depends pretty heavily on the repo.
jeswin · a year ago
> 99% of the code in this PR [for llama.cpp] is written by DeekSeek-R1

I hope we can put to rest the argument that LLMs are only marginally useful in coding - which are often among the top comments on many threads. I suppose these arguments arise from (a) having used only GH copilot which is the worst tool, or (b) not having spent enough time with the tool/llm, or (c) apprehension. I've given up responding to these.

Our trade has changed forever, and there's no going back. When companies claim that AI will replace developers, it isn't entirely bluster. Jobs are going to be lost unless there's somehow a demand for more applications.

simonw · a year ago
"Jobs are going to be lost unless there's somehow a demand for more applications."

That's why I'm not worried. There is already SO MUCH more demand for code than we're able to keep up with. Show me a company that doesn't have a backlog a mile long where most of the internal conversations are about how to prioritize what to build next.

I think LLM assistance makes programmers significantly more productive, which makes us MORE valuable because we can deliver more business value in the same amount of time.

Companies that would never have considered building custom software because they'd need a team of 6 working for 12 months may now hire developers if they only need 2 working for 3 months to get something useful.

jeswin · a year ago
> That's why I'm not worried. There is already SO MUCH more demand for code than we're able to keep up with. Show me a company that doesn't have a backlog a mile long where most of the internal conversations are about how to prioritize what to build next.

I worry about junior developers. It will be a while before vocational programming courses retool to teach this new way of writing code, and these are going to be testing times for so many of them. If you ask me why this will take time, my argument is that effectively wielding an LLM for coding requires broad knowledge. For example, if you're writing web apps, you need to be able to spot say security issues. And various other best practices, depending on what you're making.

It's a difficult problem to solve, requiring new sets of books, courses etc.

sitkack · a year ago
We have already entered a new paradigm of software development, where small teams build software for themselves to solve their own problems rather than making software to sell to people. I think selling software will get harder in the future unless it comes with special affordances.
rybosworld · a year ago
> There is already SO MUCH more demand for code than we're able to keep up with. Show me a company that doesn't have a backlog a mile long where most of the internal conversations are about how to prioritize what to build next.

This is viewing things too narrowly I think. Why do we even need most of our current software tools aside from allowing people to execute a specific task? AI won't need VSCode. If AI can short circuit the need for most, if not nearly all enterprise software, then I wouldn't expect software demand to increase.

Demand for intelligent systems will certainly increase. And I think many people are hopeful that you'll still need humans to manage them but I think that hope is misplaced. These things are already approaching human level intellect, if not exceeding it, in most domains. Viewed through that lens, human intervention will hamper these systems and make them less effective. The rise of chess engines are the perfect example of this. Allow a human to pair with stockfish and override stockfish's favored move at will. This combination will lose every single game to a stockfish-only opponent.

aibot923 · a year ago
It's interesting. Maybe I'm in the bigtech bubble, but to me it looks like there isn't enough work for everyone already. Good projects are few and far between. Most of our effort is keeping the lights on for the stuff built over the last 15-20 years. We're really out of big product ideas.
SecretDreams · a year ago
The big fear shouldn't be on loss of jobs, it should be the inevitable attack on wages. Wage will track inversely to proximity as a commodity status.

Even the discussion around AI partially replacing coders is a direction towards commoditization.

paulryanrogers · a year ago
Dev effort isn't always the bottleneck. It's often stakeholders ironing out the ambiguities, conflicting requirements, QA, ops, troubleshooting, etc.

Maybe devs will be replaced with QA, or become glorified QA themselves.

n144q · a year ago
That's the naiveity of software engineers. They can't see their limitations and think everything is just a technical problem.

No, work is never the core problem. Backlog of bug fixes/enhancements is rarely what determines the headcount. What matters is the business need. If the product sells and there is no/little competition, the company has very little incentive to improve their products, especially hiring people to do the work. You'd be thankful if a company does not layoff people in teams working on mature products. In fact, the opposite has been happening, for quite a while. There are so many examples out there that I don't need to name them.

gamblor956 · a year ago
Show me a company that doesn't have a backlog a mile long where most of the internal conversations are about how to prioritize what to build next.

Most companies don't have a milelong backlog of coding projects. That's a uniquely tech industry-specific issue, and a lot of it is driven by the tech industry's obsessive compulsion to perpetually reinvent wheels.

Companies that would never have considered building custom software because they'd need a team of 6 working for 12 months may now hire developers if they only need 2 working for 3 months to get something useful.

No, because most companies that can afford custom software want reliable software. Downtime is money. Getting unreliable custom software means that the next time around they'll just adapt their business processes to software that's already available on the market.

aomix · a year ago
I’m more bearish about LLMs but even in the extreme optimist case this is why I’m not that concerned. Every project I’m on is triaged as the one that needs the most help right now. A world when dozen projects don’t need to be left on the cutting room floor so one can live is a very exciting place.
ksec · a year ago
>There is already SO MUCH more demand for code than we're able to keep up with. Show me a company that doesn't have a backlog a mile long where most of the internal conversations are about how to prioritize what to build next.

We really are in AI moment of iPhone. I never thought I would witness something bigger than the impact of Smartphone. There are insane amount of value that we could extract out. Likely in tens of trillions from big to small business.

We keep asking how Low Code or No Code "tools" could achieve custom apps. Turns out we are here via a different route.

>custom software because they'd need a team of 6 working for 12 months may now hire developers if they only need 2 working for 3 months to get something useful.

I am wondering if it be more like 2 working for 1 month?

AlwaysRock · a year ago
Yup. People who know how to use it, and who work on tasks where LLM code is generally functional, are getting more done in less time.

I don't trust companies to translate that to, "We can do more now" rather than, "We can do more with less people now" though.

littlestymaar · a year ago
I couldn't agree more.

And this kind of fear mongering is particularly irritating when you see that our industry already faced a similar productivity shock less than twenty years ago: before open source went mainstream github and library hubs like npm we used to code the same things over and over again, most of the time in a half-backed fashion because nobody had time for polishing stuff that was needed but only tangentially related to the code business. Then came the open-source tsunami, and suddenly there was a high quality library for solving your particular problem and the productivity gain was insane.

Fast forward a few years, does it look like this productivity gains took any of our jobs? Quite the opposite actually, there has never been as many developers as today.

(Don't get me wrong, this is massively changing how we work, like the previous revolution did, and how job is never going to be the same again)

whiplash451 · a year ago
The main problem is that engineers in the Western world wont get to see the benefits themselves because a lot of Western companies will outsource the work to AI-enabled, much more effective developers in India.

India and Eastern EU will win far more (relatively) than expensive devs in the US or Western EU.

UncleOxidant · a year ago
> That's why I'm not worried. There is already SO MUCH more demand for code than we're able to keep up with. Show me a company that doesn't have a backlog a mile long where most of the internal conversations are about how to prioritize what to build next.

And yet many companies aren't hiring developers right now - folks in the C suite are thinking AI is going to be eliminating their need to hire engineers. Also "demand" doesn't necessarily mean that there's money available to develop this code. And remember that when code is created it needs to be maintained and there are costs for doing that as well.

svilen_dobrev · a year ago
maybe shifting the jobs' target.. to higher a level (finally!) ? Reminds me of:

https://chris-granger.com/2015/01/26/coding-is-not-the-new-l...

modelling has been , is , and will be the needed literacy..

deadbabe · a year ago
Too much productivity can be a bad thing.

If you’re infinitely productive, then the solution to every problem is to just keep producing stuff, instead of learning to say no.

This means a lot of companies will overbuild, and then drown in maintenance problems and fail catastrophically when they can’t keep up.

Scipio_Afri · a year ago
100% agree with this take. People are spouting economic fallacies, and it’s in part cause CEOs don't want the stock prices to fall too fast. Eventually people will widely realize this and by then the economic payoffs are still immense.
reitzensteinm · a year ago
When GPT-4 came out, I worked on a project called Duopoly [1], which was a coding bot that aimed to develop itself as much as possible.

The first commit was half a page of code that read itself in, asked the user what change they'd like to make, sent that to GPT-4, and overwrote itself with the result. The second commit was GPT-4 adding docstrings and type hints.

Over 80% of the code was written by AI in this manner, and at some point, I pulled the plug on humans, and the last couple hundred commits were entirely written by AI.

It was a huge pain to develop with how slow and expensive and flaky the GPT-4 API was at the time. There was a lot of dancing around the tiny 8k context window. After spending thousands in GPT-4 credits, I decided to mark it as proof of concept complete and move on developing other tech with LLMs.

Today, with Sonnet and R1, I don't think it would be difficult or expensive to bootstrap the thing entirely with AI, never writing a line of code. Aider, a fantastic similar tool written by HN user anotherpaulg, wasn't writing large amounts of its own code in the GPT-4 days. But today it's above 80% in some releases [2].

Even if the models froze to what we have today, I don't think we've scratched the surface on what sophisticated tooling could get out of them.

[1]: https://github.com/reitzensteinm/duopoly [2]: https://aider.chat/HISTORY.html

matsemann · a year ago
I read that Meta is tasking all engineers with figuring out how they got owned by deepseek. Couldn't they just have asked an llm instead? After their claim of replacing all of us...

I'm not too worried. If anything we're the last generation that knows how to debug and work through issues.

nkozyra · a year ago
> If anything we're the last generation that knows how to debug and work through issues.

I suspect that comment might soon feel like saying "not too worried about assembly line robots, we're the only ones who know how to screw on the lug nuts when they pop off"

dumbfounder · a year ago
Yep, and we still need COBOL programmers too. Your job as a technologist is to keep up with technology and use the best tools for the job to increase efficiency. If you don’t do this you will be left behind or you will be relegated to an esoteric job no one wants.
hnthrow90348765 · a year ago
A fair amount has been written on how to debug things, so it's not like the next generation can't learn it by also asking the AI (maybe learn it more slowly if 'learning with AI' is found to be slower)
spease · a year ago
The nature of this PR looks like it’s very LLM-friendly - it’s essentially translating existing code into SIMD.

LLMs seem to do well at any kind of mapping / translating task, but they seem to have a harder time when you give them either a broader or less deterministic task, or when they don’t have the knowledge to complete the task and start hallucinating.

It’s not a great metric to benchmark their ability to write typical code.

kridsdale3 · a year ago
Sure, but let's still appreciate how awesome it is that this very difficult (for a human) PR is now essentially self-serve.

How much hardware efficiency have we left on the the table all these years because people don't like to think about optimal use of cache lines, array alignment, SIMD, etc. I bet we could double or triple the speeds of all our computers.

kemiller · a year ago
My observation in my years running a dev shop was that there are two classes of applications that could get built. One was the high-end, full-bore model requiring a team of engineers and hundreds of thousands of dollars to get to a basic MVP, which thus required an economic opportunity in at least the tends of millions. The other, very niche or geographically local businesses that can get their needs met with a self-service tool, max budget maybe $5k or so. Could stretch that to $25k if you use offshore team to customize. But 9/10 incoming leads had budgets between $25k and $100k. We just had to turn them away. There's nothing meaningful you can do with that range of budget. I haven't seen anything particularly change that. Self-service tools get gradually better, but not enough to make a huge difference. The high end if anything has receded even faster as dev salaries have soared.

AI coding, for all its flaws now, is the first thing that takes a chunk out of this, and there is a HUGE backlog of good-but-not-great ideas that are now viable.

That said, this particular story is bogus. He "just wrote the tests" but that's a spec — implementing from a quality executable spec is much more straightforward. Deepseek isn't doing the design, he is. Still a massive accelerant.

xd · a year ago
The thing with programming, to do it well, you need to fully understand the problem and then you implement the solution expressing it in code. AI will be used to create code based on a deficit of clear understanding and we will end up with a hell of a lot of garbage code. I foresee the industry demand for programmers sky rocketing in the future, as companies scramble to unfuck the mountains of shit code they lash up over the coming years. It's just a new age of copy paste coders.
Waterluvian · a year ago
I want this to be true. Actually writing the code is the least creative, least interesting part of my job.

But I think it’s still much too early for any form of “can we all just call it settled now? In this case, as we all know, lines of code is not a useful metric. How many person hours were spent doing anything associated with this PR’s generation and how does that compare to not using AI tools, and how does the result compare in terms of the various forms of quality? That’s the rubric I’d like to see us use in a more consistent manner.

woah · a year ago
LLMs excel at tasks with very clear instructions and parameters. Porting from one language to another is something that is one step away from being done by a compiler. Another place that I've used them is for initial scaffolding of React components.
lukan · a year ago
"I hope we can put to rest the argument that LLMs are only marginally useful in coding"

I more often heard the argument, they are not useful for them. I agree. If a LLM would be trained on my codebase and the exact libaries and APIs I use - I would use them daily I guess. But currently they still make too many misstake and mess up different APIs for example, so not useful to me, except for small experiments.

But if I could train deepseek on my codebase for a reasonable amount(and they seemed to have improved on the training?), running it locally on my workstation: then I am likely in as well.

AlwaysRock · a year ago
We are getting closer and closer to that. For a while llm assistants were not all that useful on larger projects because they had limited context. That context has increased a lot over the last 6 months. Some tools will even analysis your entire codebase and use that in responses.

It is frustrating that any smaller tool or api seem to stump llms currently but it seems like context is the main thing that is missing and that is increasing more and more.

sureglymop · a year ago
I am working on something even deeper. I have been working on a platform for personal data collection. Basically a server and an agent on your devices that records keystrokes, websites visited, active windows etc.

The idea is that I gather this data now and it may become useful in the future. Imagine getting a "helper AI" that still keeps your essence, opinions and behavior. That's what I'm hoping for with this.

rane · a year ago
The idea is that you give the libraries and APIs as context with your prompt.
submeta · a year ago
The dev jobs won‘t go away, but they will change. Devs will be more and more like requirements engineers who need to understand the problem to then write prompts with the peoper context so that the llm can produce valuable and working code. And the next level will be to prompt llms to generate prompts for llms to produce code and solutions.

But already I hire less and less developers for smaller tasks. The things that I‘d assign to a dev in Ukraine to explore an idea, do a data transformation, make a UI for the internal company tool. I can do these things quicker with llm than trying to find a dev and explain the task.

WXLCKNO · a year ago
I think what you're describing is going to be a very short transitional period.

Once current AI gets good enough, the people micromanaging parts of it will do more to hinder the process than to help it.

One person setting the objectives and the AI handling literally everything else including brainstorming issues etc, is going to be all that's needed.

headcanon · a year ago
Agreed, though to your point I think we'll end up seeing more induced demand long-term

- This will enable more software to be built and maintained by same or fewer people (initially). Things that we wouldn't previously bother to do are now possible.

- More software means more problems (not just LLM-generated bugs which can be handled by test suites and canary deploys, but overall features and domains of what software does)

- This means skilled SWEs will still be in demand, but we need to figure out how to leverage them better.

- Many codebases will be managed almost entirely by agents, effectively turning it into the new "build target". This means we need to build more tooling to manage these agents and keep them aligned on the goal, which will be a related but new discipline.

SWEs would need to evolve skillsets but wasn't that always the deal?

jasonthorsness · a year ago
I think quality is going to go up - I have so much code I wish I could go back and optimize for better performance, or add more comprehensive tests for, and LLMs are getting great at both of those as they work really well off of things that already exist. There has never been enough time/resources to apply towards even the current software demand, let alone future needs.
nh2 · a year ago
Challenge: I would really like somebody that has experience in LLM based coding tools to try and fix gnome-terminal:

https://news.ycombinator.com/item?id=42814509

smokel · a year ago
I really like this idea.

However, it also highlights a key problem that LLMs don’t solve: while they’re great at generating code, that’s only a small part of real-world software development. Setting up a GitHub account, establishing credibility within a community, and handling PR feedback all require significant effort.

In my view, lowering the barriers to open-source participation could have a bigger impact than these AI models alone. Some software already gathers telemetry and allows sharing bug reports, but why not allow the system to drop down to a debugger in an IDE? And why can’t code be shared as easily as in Google Docs, rather than relying on text-based files and Git?

Even if someone has the skills to fix bugs, the learning curve for compilers, build tools, and Git often dilutes their motivation to contribute anything.

20k · a year ago
Eh it performed a 1:1 conversion of ARM NEON to wasm SIMD, which with the greatest will in the world is pretty trivial work. Its something that ML is good at, because its the same problem area as "translate this from english to french", but more mechanistic

This is a task that would likely have taken as long to write by hand as the AI took to do it, given how long the actual task took to execute. 98% of the work is find and replace

Don't get me wrong - this kind of thing is useful and cool, but you're mixing up the easy coding donkey work with the stuff that takes up time

If you look at the actual prompt engineering part, its clear that this prompting produced extensively wrong results as well, which is tricky. Because it wasn't produced by a human, it requires extensive edge case testing and review, to make sure that the AI didn't screw anything up. If you have the knowledge to validate the output, it would have been quicker to write it by hand instead of reverse engineering the logic by hand. Its bumping the work off from writing it by hand, to the reviewers who now have to check your ML code because you didn't want to put in the work by hand

So overall - while its extremely cool that it was able to do this, it has strong downsides for practical projects as well

WhitneyLand · a year ago
Every time AI achieves something new/productive/interesting, cue the apologists who chime in to say “well yeah but that really just decomposes into this stuff so it doesn’t mean much”.

I don’t get why people don’t understand that everything decomposes into other things.

You can draw the line for when AI will truly blow your mind anywhere you want, the point is the dominoes keep falling relentlessly and there’s no end in sight.

UncleEntity · a year ago
IDK, I was playing with Claude yesterday/this morning and before I hit the free tier context limit it managed to create a speech-to-phoneme VQ-VAE contraption with a sliding window for longer audio clips and some sort of "attention to capture relationships between neighboring windows" that I don't quite understand. That last part was due to a suggestion it provided where I was like "umm, ok..."

Seems pretty useful to me where I've read a bunch of papers on different variational autoencoder but never spent the time to learn the torch API or how to set up a project on the google.

In fact, it was so useful I was looking into paying for a subscription as I have a bunch of half-finished projects that could use some love.

karmasimida · a year ago
I am mixed on this point.

I 100% agree with you our trade is changed forever.

On the other hand, I am writing like 1000+ LOC daily, without much compromise on quality and my mental health, and thought of writing some code that is necessary but feels like a chore is not longer the case. The boost in output is incredible.

nine_zeros · a year ago
> Our trade has changed forever, and there's no going back. When companies claim that AI will replace developers, it isn't entirely bluster. Jobs are going to be lost unless there's somehow a demand for more applications

This is a key insight - the trade has changed.

For a long time, hoarding talent - who could conceive and implement such PRs - was a competitive advantage. It no longer is because companies can hire and get similar outcomes, with fewer and mediocre devs.

But at the same time, these companies have lost their technological moat. The people were the biggest moat. The hoarding of people were the reason why SV could stay ahead of other concentrated geographies. This is why SV companies grew larger and larger.

But now, anyone anywhere can produce anything and literally demolish any competitive advantage of large companies. As an example, literally a single Deepseek release yesterday destroyed large market cap companies.

It means that the future world is likely to have a large number of geographically distributed developers, always competing, and the large companies will have to shed market cap because their customers will be distributed among this competition.

It's not going to be pleasant. Life and work will change but it is not merely loss of jobs but it is going to be loss of the large corporation paradigm.

happyopossum · a year ago
> literally a single Deepseek release yesterday destroyed large market cap companies

Nobody was “destroyed” - a handful of companies had their stock price drop, a couple had big drops, but most of those stocks are up today, showing that the market is reactionary.

djmips · a year ago
You write this as if DeepSeek's R1 was conceived and written by AI itself.

Do you have a link to that?

chrisguilbeau · a year ago
I'm a developer that primarily uses gh copilot for python dev. I find it pretty useful as an intelligent auto-completer that understands our project's style, and unusual decorators we use.

What tools would you tell a copilot dev to try? For example, I have a $20/mo ChatGPT account and asking it to write code or even fix things hasn't worked very well. What am I missing?

byteknight · a year ago
While I dont know your scenario as an avid user of both gpt and claude, I would recommend move away from Google style search queries, and begin conversing. The more you give the LLM the more you'll get close to what you want.
fauigerzigerk · a year ago
A long time ago, I held the grandiose title of software architect. My job was to describe in a mix of diagrams, natural language and method signatures what developers were supposed to do.

The back and forth was agonising. They were all competent software engineers but communicating with them was often far more work than just writing the damn code myself.

So yes I do believe that our trade has changed forever. But the fact that some of our coworkers will be AIs doesn't mean that communicating with them is suddenly free. Communcation comes with costs (and I don't mean tokens). That won't change.

If you know your stuff really well, i.e. you work on a familiar codebase using a familiar toolset, the shortest path from your intentions to finished code will often not include anyone else - no humans and no AI either.

In my opinion, "LLMs are only marginally useful in coding" is not true in general, but it could well be true for a specific person and a specific coding task.

cjbgkagh · a year ago
Who would the new applications be for? I figure that it’ll be far easier to build apps for use by LLMs than building apps for people to use. I don’t think there will be this large increase of induced demand, the whole world just got a lot more efficient and that’s probably a bad thing for the average person.
6510 · a year ago
Oh right, we will have a B2B B2C B2L L2B L2C and ultimately the L2L market.
fragmede · a year ago
take some process that you, or someone you know does right now that involves spreadsheets and copy-pasting between various apps. hiring a software engineer to build an app so it's just a [do-it] button previously didn't make sense because software engineer time was too expensive. Now, that app can be made, so the HR or whatever person doesn't need to waste their time on automatable tasks.
gmt2027 · a year ago
If AI increases the productivity of a single engineer between 10-100x over the next decade, there will be a seismic shift in the industry and the tech giants will not walk away unscathed.

There are coordination costs to organising large amounts of labour. Costs that scale non-linearly as massive inefficiencies are introduced. This ability to scale, provide capital and defer profitability is a moat for big tech and the silicon valley model.

If a team of 10 engineers become as productive as a team of 100-1000 today, they will get serious leverage to build products and start companies in domains and niches that are not currently profitable because the middle managers, C-Suite, offices and lawyers are expensive coordination overhead. It is also easier to assemble a team of 10 exceptional and motivated partners than 1000 employees and managers.

Another way to think about it is what happens when every engineer can marshal the AI equivalent of $10-100m dollars of labour?

My optimistic take is that the profession will reach maturity when we become aware of the shift in the balance of power. There will be more solo engineers and we will see the emergence of software practices like the ones doctors, lawyers and accountants operate.

darkwater · a year ago
This is a really interesting take that I don't see often in the wild. Actually, it's the first time I read someone saying this. But I think you are definitely onto something, especially if costs of AI are going to lower faster than expected even a few weeks ago.
kragen · a year ago
I'm tempted by this vision, though that in itself makes me suspicious that I'm indulging in wishful thinking. Also lutusp wrote a popular article promoting it about 45 years ago, predicting that no companies like today's Microsoft would come to exist.

A thing to point out is that management is itself a skill, and a difficult one, one where some organizations are more institutionally competent than others. It's reasonable to think of large-organization management as the core competency of surviving large organizations. Possibly the hypothetical atomizing force you describe will create an environment where they are poorly adapted for continuing survival.

AznHisoka · a year ago
To play devils advocate, the main obstacle in launching a product doesn't involve the actual development/coding. Unless you're building something in hard-tech, it's relatively easy to build the run of the mill software.

The obstacles are in marketing, selling it, building a brand/reputation, integrating it with lots of 3rd party vendors, and supporting it.

So yes, you can build your own Salesforce, or your own Adobe Photoshop with a one-man crew much faster and easier. But that doesn't mean you, as an engineer can now build your own business selling it to companies who don't know anything about you.

svilen_dobrev · a year ago
a (tile-placing) guy who was rebuilding my bathrooms, told this story:

when he was greener, he happened to work with some old fart... who managed to work 10x faster than others, with this trick: put all the tiles on the wall with a diluted cement-glue very quick, then moving one tile forces most other tiles around to move as well.. so he managed to order all the tiles in very short time.

As i never had the luxury of decent budget, since long time ago i was doing various meta-programming things, then meta-meta-programming.. up to extent of say, 2 people building and managing and enjoying a codebase of 100KLOC (python) + 100KLOC js... ~~30% generated static and unknown %% generated-at-runtime - without too much fuss or overwork.

But it seems that this road has been a dead end... for decades. Less and less people use meta-programming, it needs too deep understanding ; everyone just adds yet-another (2y "senior") junior/wanna-be to copy-paste yet another crud.

So maybe the number of wanna-bees will go down. Or "senior" would start meaning something.. again. Or idiotically-numbing-stoopid requirements will stop appearing..

WXLCKNO · a year ago
Like darkwater's comment, this is my first time seeing this take and I like it a lot.

I hate the idea of building a business to hundreds/thousands of employees, I love startups and small but highly profitable businesses.

Having productivity be unleashed in this way with a small team of people I trust would be amazing.

Aloisius · a year ago
As long as the output of AI is not copyrightable, there will be demand for human engineers.

After all, if your codebase is largely written by AI, it becomes entirely legal to copy it and publish it online, and sell competing clones. That's fine for open source, but not so fine for a whole lot of closed source.

simlevesque · a year ago
I got incredible results in asking AIs for sql queries. I just enter my data and what I want the output to look like. Then I ask it to provide 10 different versions that might be faster. I test them all and tell it which is faster and then I ask it to make variations on this path. Then I ask it to add comments to the code which is the fastest. I verify the query, do some more test, and I'm good to go. I understand SQL pretty well but trying to make 10 different versions of one code would've took me at least an hour.
nkozyra · a year ago
> it isn't entirely bluster

"Development" is effectively translating abstractions of an intended operation to machine language.

What I find kind of funny about the current state is we're using large language models to, like, spit out React or Python code. This use case is obviously an optimization to WASM, so a little closer to the metal, but at what point to programs (effectively suites of operations) just cut out the middleman entirely?

dboreham · a year ago
I've wondered about this too. The LLM could just write machine code. But now a human can't easily review it. But perhaps TDD makes that ok. But now the tests need to be written in a human readable language so they can be checked. Or do they? And if the LLM is always right why does the code need to be tested?
fennecfoxy · a year ago
I'm not entirely worried, yet. I don't think that LLMs produce code/architecture that is trustworthy enough for them to operate independently. Of course suits & ties will believe it and kill many dev jobs eventually, but as per usual wtf do they know about how things work on the ground. Executives thrive on NYT articles and hearsay from LinkedIn.
rozap · a year ago
Broadly agree. Whether or not it is useful isn't really an interesting discussion, because it so clearly is useful. The more interesting question is what it does to supply and demand. If the past is any indication, I think we've seen that lowering to barrier to getting software shipped and out the door (whether it's higher level languages, better tooling) has only made demand greater. Maybe this time it's different because it's such a leap vs an incremental gain? I don't know. The cynical part of me thinks that software always begets more software, and systems just become ever more complex. That would suggest that our jobs are safe. But again, I don't say that with confidence.
simonw · a year ago
> If the past is any indication, I think we've seen that lowering to barrier to getting software shipped and out the door (whether it's higher level languages, better tooling) has only made demand greater.

Something I think about a lot is the impact of open source on software development.

25 years ago any time you wanted to build anything you pretty much had to solve the same problems as everyone else. When I went to university it even had a name - the software reusability crisis. At the time people thought the solution was OOP!

Open source solved that. For any basic problem you want to solve there are now dozens of well tested free libraries.

That should have eliminated so many programming jobs. It didn't: it made us more productive and meant we could deliver more value, and demand for programmers went up.

oorza · a year ago
I don't think it's necessarily any larger of a leap than any of the other big breakthroughs in the space. Does writing safe C++ with an LLM matter more than choosing Rust? Does writing a jQuery-style gMail with an LLM matter more than choosing a declarative UI tool? Does adding an LLM to Java 6 matter more than letting the devs switch to Kotlin?

Individual developer productivity will be expected to rise. Timelines will shorten. I don't think we've reached Peak Software where the limiting factor on software being written is demand for software, I think the bottlenecks are expense and time. AI tools can decrease both of those, which _should_ increase demand. You might be expected to spend a month outputting a project that would previously have taken four people that month, but I think we'll have more than enough demand increase to cover the difference. How many business models in the last twenty years that weren't viable would've been if the engineering department could have floated the company to series B with only a half dozen employees?

What IS larger than before, IMO, is the talent gap we're creating at the top of the industry funnel. Fewer juniors are getting hired than ever before, so as seniors leave the industry due to standard attrition reasons, there are going to be fewer candidates to replace them. If you're currently a software engineer with 10+ YoE, I don't think there's much to worry about - in fact, I'd be surprised if "was a successful Software Engineer before the AI revolution" doesn't become a key resume bullet point in the next several years. I also think that if you're in a position of leadership and have the creativity and leadership to make it work, juniors and mid-level engineers are going to be incredibly cost effective because most middle managers won't have those things. And companies will absolutely succeed or fail on that in the coming years.

redcobra762 · a year ago
When tools increase a worker's efficiency, it's rare that the job is lost. It's much more common that the demand for that job changes to take advantage of the productivity growth.

This is why the concerns from Keynes and Russel about people having nothing to do as machines automated away more work ended up being unfounded.

We fill the time... with more work.

And workers that can't use these tools to increase their productivity will need to be retrained or moved out of the field. That is a genuine concern, but this friction is literally called the "natural rate of unemployment" and happens all the time. The only surprise is we expected knowledge work to be more inoculated from this than it turns out to be.

Myrmornis · a year ago
In my experience a lot of it is (d) defaulting to criticizing new things, especially things that are "trendy" or "hot" and (e) not liking to admit that one's own work can partially be done by such a trendy or hot thing.
nuancebydefault · a year ago
AI will only ever be able to develop what it is asked/prompted for. The question is often ill formed, resulting in an app that does not do what you want. So the prompt needs to be updated, the result needs to be evaluated and tweaks need to be done to the code with or without help of AI.

In fact, from a distance seen, the software development pattern in AI times stays the same as it was pre-AI, pre-SO, pre-IDE as well as pre-internet.

Just to say, sw developers will still be sw developers.

kragen · a year ago
It's possible that the previous tools just weren't good enough yet. I play with GPT-4 programming a lot, and it usually takes more work than it would take to write the code myself. I keep playing with it because it's so amazing, but it isn't to the point where it's useful to me in practice for that purpose. (If I were an even worse coder than I am, it would be.) DeepSeek looks like it is.
casenmgreen · a year ago
I may be wrong, but I think right now, from reading stories of people looking at use AI and having poor experiences, AI is useful and effective for some tasks and not for others, and this is an intrinsic property - it won't get better with bigger models. You need a task which fits well with what AI can do, which is basically auto-complete. If you have a task which does not fit well, it's not going to fly.
simonw · a year ago
Right: LLMs have a "jagged frontier". They are really good at some things and terrible at other things, but figuring out WHAT those things are is extremely unintuitive.

You have to spend a lot of time experimenting with them to develop good intuitions for where they make sense to apply.

I expect the people who think LLMs are useless are people who haven't invested that time yet. This happens a lot, because the AI vendors themselves don't exactly advertise their systems as "they're great at some stuff and terrible at other stuff and here's how to figure that out".

plainOldText · a year ago
Indeed, our trade has changed forever, and more specifically, we might have to alter our operational workflows in the entire industry as well.

There are so many potential trajectories going forward for things to turn sour, I don't even know where to start the analysis. The level of sophistication an AI can achieve has no upper bound.

I think we've had a good run so far. We've been able to produce software in the open with contributions from any human on the planet, trusting it was them who wrote the code, and with the expectation that they also understand it.

But now things will change. Any developer, irrespective of skill and understanding of the problem and technical domains can generate sophisticated looking code.

Unfortunately, we've reached a level of operational complexity in the software industry, that thanks to AI, could be exploited in a myriad ways going forward. So perhaps we're going to have to aggressively re-adjust our ways.

herval · a year ago
I don't think trusting that someone wrote the code was ever a good assurance of anything, and I don't see how that changes with AI. There will always be certain _individuals_ who are more reliable than others, not because they handcraft code, but because they follow through with it (make sure it works, fix bugs after release, keep an eye to make sure it worked, etc).

Yes, AI will enable exponentially more people to write code, but that's not a new phenomenon - bootcamps enabled an order of magnitude more people to become developers. So did higher level languages, IDEs, frameworks, etc. The march of technology has always been about doing more while having to understand less - higher and higher levels of abstraction. Isn't that a good thing?

Vegenoid · a year ago
> Our trade has changed forever, and there's no going back

Forever? Hell, it hasn't even existed for a lifetime yet.

kragen · a year ago
01945 to 02025 is 80 years, longer than human life expectancy at birth. What's your definition of a "lifetime"?
thrance · a year ago
Did you even look at the generated code? DeepSeek simply rewrote part of the inference code making use of SIMD instructions on wasm. It literally boils down to inserting `if defined __wasm_simd128__` at some places then rewritting the loops to do floating point operations two by two instead of one after the other (which is where the 2X claim comes from). This is very standard and mostly boilerplate.

Useful, sure, in that it saved some time in this particular case. But most of the AI-generated code I interact with is a hot unmaintainable mess of very verbose code, which I'd argue actually hurts the project in the long term.

simonw · a year ago
"But most of the AI-generated code I interact with is a hot unmaintainable mess of very verbose code"

That sounds like you're working with unskilled developers who are landing bad code.

lenerdenator · a year ago
My greatest problem is duplicating the secret sauce of GHCP: it has access to your project and can use it as context.

Admittedly, I haven't looked too hard, but how could I do that with a model from, say, Ollama and run exclusively on my machine?

simonw · a year ago
There are a bunch of tools that might be able to do that. I'd start by exploring https://aider.chat/
nkozyra · a year ago
Couldn't you load the whole thing into a database or memory and use it as a RAG source? Not sure if that would fully scratch the itch.
unshavedyak · a year ago
I’m still just looking for a good workflow where I can stay in my editor and largely focus on code, rather than trying to explain what I want to an LLM.

I want to stay in Helix and find a workflow that “just works”. Not sure even what that looks like yet

WXLCKNO · a year ago
Just to clarify, something like Cursor doesn't fit your needs right?
chefandy · a year ago
GH copilot code completion is really the only one I’ve found to be consistently more of a benefit than a time sync. Even with the spiffy code generators using Claude or whatever, I often find myself spending as much time figuring out where the logical problem is than if I had just coded it myself, and you still need to know exactly what needs to be done.

I’d be interested in seeing how much time they spent debugging the generated code and and how long they spent constructing and reconstructing the prompts. I’m not a software developer anymore as my primary career, so if the entire lower-half of the software development market went away catering wages as it did, it wouldn’t directly affect my professional life. (And with the kind of conceited, gleeful techno-libertarian shit I’ve gotten from the software world at large over the past couple of years as a type of specialized commercial artist, it would be tough to turn that schadenfreude into empathy. But we honestly need to figure out a way to stick together or else we’re speeding towards a less mechanical version of Metropolis.)

TheBigSalad · a year ago
LLMs are only marginally useful for coding. You have simply chosen to dismiss or or 'give up' on that fact. You've chosen what you want to believe in contrast to the reality that we are all experiencing.
simonw · a year ago
LLMs are incredibly useful for coding, if you learn how to apply them effectively. You have simply chosen to dismiss or 'give up' on that fact.
mjr00 · a year ago
> I hope we can put to rest the argument that LLMs are only marginally useful in coding - which are often among the top comments on many threads. I suppose these arguments arise from (a) having used only GH copilot which is the worst tool, or (b) not having spent enough time with the tool/llm, or (c) apprehension. I've given up responding to these.

Look at the code that was changed[0]. It's a single file. From what I can tell, it's almost purely functional with clearly specified inputs and outputs. There's no need to implement half the code, realize the requirements weren't specified properly, and go back and have a conversation with the PM about it. Which is, you know, what developers actually do.

This is the kind of stuff LLMs are great at, but it's not representative of a typical change request by Java Developer #1753 at Fortune 500 Enterprise Company #271.

[0] https://github.com/ggerganov/llama.cpp/pull/11453/files

simonw · a year ago
"Yeah, but LLMs can't handle millions of lines of crufty old Java" is a guaranteed reply any time this topic comes up.

(That's not to say it isn't a valid argument.)

Short answer: LLMs are amazingly useful on large codebases, but they are useful in different ways. They aren't going to bang out a new feature perfectly first time, but in the right hands they can dramatically accelerate all sorts of important activities, such as:

- Understanding code. If code has no documentation, dumping it into an LLM can help a lot.

- Writing individual functions, classes and modules. You have to be good at software architecture and good at prompting to use them in this way - you take on the role of picking out the tasks that can be done independently of the rest of the code.

- Writing tests - again, if you have the skill and experience to prompt them in the right way.

jvanderbot · a year ago
This is great. Really! Buuut...

How do you get these tools to not fall over completely when relying on an existing non-public codebase that isn't visible in just the current file?

Or, how do you get them to use a recent API that doesn't dominate their training data?

Combining the both, I just cannot for the life of me get them to be useful beyond the most basic boilerplate.

Arguably, SIMD intrinsics are a one-to-one translation boilerplate, and in the case of this PR, is a leetcode style, well-defined problem with a correct answer, and an extremely well-known api to use.

This is not a dig on LLMs for coding. I'm an adopter - I want them to take my work away. But this is maybe 5% of my use case for an LLM. The other 95% is "Crawl this existing codebase and use my APIs that are not in this file to build a feature that does X". This has never materialized for me -- what tool should I be using?

simonw · a year ago
"Or, how do you get them to use a recent API that doesn't dominate their training data?"

Paste in the documentation or some examples. I do this all the time - "teaching" an LLM about an API it doesn't know yet is trivially easy if you take advantage of the longer context inputs to models these days.

withinboredom · a year ago
Hahaha. My favorite was when we bumped go up to use go 1.23 and our AI code review tool flagged it because “1.22 is actually the latest release.” Yesterday.
attractivechaos · a year ago
I wonder what prompt they use. Before asking DeekSeek – is there a good post/video that walks through this procedure?
myrmi · a year ago
I feel uncomfortably called out by all three points. What tools should I be trying to see what you are?
jeswin · a year ago
I use my own tools and scripts, and those aren't for everyone - so I'm just gonna make some general suggestions.

1. You should try Aider. Even if you don't end up using it, you'll learn a lot from it.

2. Conversations are useful and important. You need to figure out a way to include (efficiently, with a few clicks) the necessary files into the context, and then start a conversation. Refine the output as a part of the conversation - by continuously making suggestions and corrections.

3. Conversational editing as a workflow is important. A better auto-complete is almost useless.

4. Github copilot has several issues - interface is just one of them. Conversational style was bolted on to it later, and it shows. It's easier to chat on Claude/Librechat/etc and copy files back manually. Or use a tool like Aider.

5. While you can apply LLMs to solve a particular lower level detail, it's equally effective (perhaps more effective) to have a higher level conversation. Start your project by having a conversation around features. And then refine the structure/scaffold and drill-down to the details.

6. Gradually, you'll know how to better organize a project and how to use better prompts. If you are familiar with best practices/design patterns, they're immediately useful for two reasons. (1) LLMs are also familar with those, and will help with prompt clarity; (2) Modular code is easier to extend.

7. Keep an eye on better performing models. I haven't used GPT-4o is a while, Claude works much, much better. And sometimes you might want to reach for o1 models. Other lower-end models might not offer any time savings; so stick to top tier models you can afford. Deepseek models have brought down the API cost, so it's now affordable to even more people.

8. Finally, it takes time. Just as any other tool.

kikimora · a year ago
I don’t understand. When I asked DeepSeek how to find AWS IoT Thing creation time it suggested me to use “version” field and treat it as a Unix timestamp. This is obvious nonsense. How can this tool generate anything useful other than summaries of pre-existing text? My knowledge of theory behind LLMs also suggests this is all they can do reasonably well.

When I see claims like this I suspect that either people around me somehow 10x better at promoting or they use different models.

simonw · a year ago
You're making the mistake of treating an LLM like a search engine, and expecting it to be able to answer questions directly from its training data.

Sometimes this works! But it's not guaranteed - this isn't their core strength, especially once you get into really deep knowledge of complex APIs.

They are MUCH more useful when you use them for transformation tasks: feed in examples of the APIs you need to work with, then have them write new code based on that.

Working effectively with LLMs for writing code is an extremely deep topic. Most people who think they aren't useful for code have been mislead into believing that the LLMs will just work - and that they don't first need to learn a whole bunch of unintuitive stuff in order to take advantage of the technology.

JKCalhoun · a year ago
> When companies claim that AI will replace developers, it isn't entirely bluster.

I'm not so sure there isn't a bit of bluster in there. Imagine when you hand-coded in either machine code or assembly and then high level languages became a thing. I assume there was some handwringing then as well.

Dead Comment

Dead Comment

Jerrrry · a year ago
Maybe those software engineers should "lrn2code", just as the journalists, artists, and trucker drivers had to.
mclau156 · a year ago
Its cope
sarasasa28 · a year ago
I mean, I don't know when do you retire in your countries. Here, it's at 65 years old (ridiculous)

I am 30 and even before AI, I NEVER thought for a moment I would get to keep coding until I am f*king 65, lol

esafak · a year ago
Why, ageism?
cynicalpeace · a year ago
It's making programming more boring and more of an admin task- which is sure to attract different types of people to the field.
woah · a year ago
Seems like the exact opposite. The very example you are replying to is the mechanistic translation of one low level language to another, maybe one of the most boring tasks imaginable.
mythrwy · a year ago
You are getting downvoted but I agree.

For whatever reason a good part of the joy of day to day coding for me was solving many trivial problems I knew how to solve. Sort of like putting a puzzle together. Now I think higher level and am more productive but it's not as much fun because the little easy problems aren't worth my time anymore.

thefourthchime · a year ago
There is a near-infinite demand for more applications. They simply become more specific and more niche. You can think to a point where everyone has their own set of applications custom for the exact workflow that they like.

Just look at the options dialogue for Microsoft Word at least back in the day. It was pretty much everyone's pet feature over the last 10 years.

mohsen1 · a year ago
I am subscribed to o1 Pro and am working on a little Rust crate.

I asked both o1 Pro and Deepseek R1 to write e2e tests given all of the code in the repo (using yek[1]).

o1 Pro code: https://github.com/bodo-run/clap-config-file/pull/3

Deepseek R1: https://github.com/bodo-run/clap-config-file/pull/4

My judgement is that Deepseek wrote better tests. This repo is small enough for making a judgement by reviewing the code.

Neither pass tests.

[1] https://github.com/bodo-run/yek

terhechte · a year ago
I have a set of tests that I can run against different models implemented in different languages (e.g. the same tests in Rust, Ts, Python, Swift), and out of these languages, all models have by far the most difficulty with Rust. The scores are notably higher for the same tests in other languages. I'm currently preparing the whole thing for release to share, but its not ready yet because some urgent work-work came up.
colonial · a year ago
Can confirm anecdotally. Even R1 (the full, official version with web search enabled) crashes out hard on my personal Rust benchmark - it refers to multiple items (methods, constants) that don't exist and fails to import basic necessary traits like io::Read. Embarrassing, and does little to challenge my belief that these models will never reliably advance beyond boilerplate.

(My particular test is to ask for an ICMP BPF that does some simple constant comparisons. Correctly implemented, this only takes 6 sock_filters.)

ngxson · a year ago
Hi I'm Xuan-Son,

Small correct, I'm not just asking it to convert ARM NEON to SIMD, but for the function handling q6_K_q8_K, I asked it to reinvent a new approach (without giving it any prior examples). The reason I did that was because it failed writing this function 4 times so far.

And a bit of context here, I was doing this during my Sunday and the time budget is 2 days to finish.

I wanted to optimize wllama (wasm wrapper for llama.cpp that I maintain) to run deepseek distill 1.5B faster. Wllama is totally a weekend project and I can never spend more than 2 consecutive days on it.

Between 2 choices: (1) to take time to do it myself then maybe give up, or (2) try prompting LLM to do that and maybe give up (at worst, it just give me hallucinated answer), I choose the second option since I was quite sleepy.

So yeah, turns out it was a great success in the given context. Just does it job, saves my weekend.

Some of you may ask, why not trying ChatGPT or Claude in the first place? Well, short answer is: my input is too long, these platforms straight up refuse to give me the answer :)

amarcheschi · a year ago
Aistudio.google.com offers free long context chats (1/2mln tokens), just select the appropriate model, 1206 or 2.0 flash thinking
simonw · a year ago
Thanks very much for sharing your results so far.
resource_waste · a year ago
My number 1 criticism of long term LLM claims is that we already hit the limit.

If you see the difference between a 7B model and a 70B model, its only slightly impressive. a 70B and a 400B model is almost unnoticeable. Does going from 400B to 2T do anything?

Every layer like using python to calculate a result, or using chain of thought, destroys the purity. It works great for Strawberries, but not great for developing an aircraft. Aircraft will still need to be developed in parts, even with a 100T model.

When you see things like "By 20xx", no, we already hit it. Improvements you see are mere application layers.

zulban · a year ago
When you use words like purity, you're making an ideological value judgment. You're not talking about computer science or results.