Good for them to design and publish this - I doubt you'd see anything like this from the other labs.
The loss of competency seems pretty obvious but it's good to have data. What is also interesting to me is that the AI assisted group accomplished the task a bit faster but it wasn't statistically significant. Which seems to align with other findings that AI can make you 'feel' like you're working faster but that perception isn't always matched by the reality. So you're trading learning and eroding competency for a productivity boost which isn't always there.
It's research from a company that gains from selling said tools they researched. Why does it have to be repeated that this is a massive conflict of interests and until this "research" has been verified multiple times by parties with zero conflict of interests it's best to be highly skeptical of anything it claims?
This is up there with believing tobacco companies health "research" from the 30s, 40s, 50s, 60s, 70s, 80s, and 90s.
I mean, they're literally pointing out the negative effects of AI-assisted coding?
> We found that using AI assistance led to a statistically significant decrease in mastery. On a quiz that covered concepts they’d used just a few minutes before, participants in the AI group scored 17% lower than those who coded by hand, or the equivalent of nearly two letter grades. Using AI sped up the task slightly, but this didn’t reach the threshold of statistical significance.
But people like that they aren't shying away from negative results and that builds some trust. Though let's not ignore that they're still suggesting AI + manual coding.
But honestly, this sample size is so small that we need larger studies. The results around what is effective and ineffective AI usage is a complete wash with n<8.
Also anyone else feel the paper is a bit sloppy?
I mean there's a bunch of minor things but Figure 17 (first fig in the appendix) is just kinda wild. I mean there's trivial ways to solve the glaring error. The more carefully you look at even just the figures in the paper the more you say "who the fuck wrote this?" I mean like how the fuck do you even generate Figure 12? The numbers align with the grids but boxes are shifted. And Figure 16 has experience levels shuffled for some reason. And then there are a hell of a lot more confusing stuff you'll see if you do more than a glance...
I wish they had attempted to measure product management skill.
My hypothesis is that the AI users gained less in coding skill, but improved in spec/requirement writing skills.
But there’s no data, so it’s just my speculation. Intuitively, I think AI is shifting entry level programmers to focus on expressing requirements clearly, which may not be all that bad of a thing.
> I wish they had attempted to measure product management skill.
We're definitely getting better at writing specs. The issue is the labor bottleneck is competent senior engineers, not juniors, not PMs, not box-and-arrow staff engineers.
> I think AI is shifting entry level programmers to focus on expressing requirements clearly
This is what the TDD advocates were saying years ago.
What AI development has done for my team is the following:
Dramatically improved Jira usage -- better, more descriptive tickets with actionable user stories and clearly expressed requirements.
Dramatically improved github PRs.
Dramatically improved test coverage.
Dramatically improved documentation, not just in code but in comments.
Basically all _for free_, while at the same time probably doubling or tripling our pace at closing issues, including some issues in our backlog that had lingered for months because they were annoying and nobody felt like working on them, but were easy for claude to knock out.
Interestingly if you look at the breakdown by years of experience, it shows the 1-3 year junior group being faster, 4+ years no difference
I wonder if we're going to have a future where the juniors never gain the skills and experience to work well by themselves, and instead become entirely reliant on AI, assuming that's the only way
I think we're going to see a small minority of juniors who managed to ignore the hype/peer pressure/easy path and actually learned to code have a huge advantage over the others.
I agree with the Ray Dalio perspective on this. AI is not a creative force. It is only a different form of automation. So, the only value to AI is to get to know your habits. As an example have it write test cases in your code style so you don't have to. That is it.
If you sucked before using AI you are going to suck with AI. The compounded problem there is that you won't see just how bad you suck at what you do, because AI will obscure your perspective through its output, like an echo chamber of stupid. You are just going to suck much faster and feel better about it. Think of it as steroids for Dunning-Kruger.
> The loss of competency seems pretty obvious but it's good to have data
That's not what the study says. It says that most users reflect your statement while there is a smaller % that benefits and learns more and faster.
Generalizations are extremely dangerous.
What the article says simply reflect that most people don't care that much and default to the path of least resistance, which is common every day knowledge, but we very well know this does not apply to everyone.
> Among participants who use AI, we
find a stark divide in skill formation outcomes between high-scoring interaction patterns (65%-86% quiz score) vs low-scoring interaction patterns (24%-39% quiz score). The high scorers only asked AI conceptual
questions instead of code generation or asked for explanations to accompany generated code; these usage patterns demonstrate a high level of cognitive engagement.
This is very much my experience. AI is incredibly useful as a personal tutor
> there is a smaller % that benefits and learns more and faster
That's not what the study says nor it is capable of credibly making that claim. You are reasoning about individuals in an RCT where subjects did not serve as their own control. The high performers in the treatment group may have done even better had they been in the control and AI is in fact is slowing them down.
You don't know which is true because you can't know because of the study design. This is why we have statistics.
This is all wonderful and all but what happens when these tools aren't available - you lose internet connection or the agent is misconfigured or you simply ran out of credits. How would someone support their business / software / livelihood? First, the agents would take our software writing tasks then they encroach on CI/CD and release process and take over from there...
Now, imagine a scenario of a typical SWE in todays or maybe not-so-distant future: the agents build your software, you simply a gate-keeper/prompt engineer, all tests pass, you're now doing a production deployment at 12am and something happens but your agents are down. At that point, what do you do if you haven't build or even deployed the system? You're like a L1 support at this point, pretty useless and clueless when it comes to fully understanding and supporting the application .
I've had a fairly long career as a web dev. When I started, I used to be finicky about configuring my dev environment so that if the internet went down I could still do some kind of work. But over time, partly as I worked on bigger projects and partly as the industry changed, that became infeasible.
So you know what do, what I've been doing for about a decade, if the internet goes down? I stop working. And over that time I've worked in many places around the world, developing countries, tropical islands, small huts on remote mountains. And I've lost maybe a day of work because of connectivity issues. I've been deep in a rainforest during a monsoon and still had 4g connection.
If Anthropic goes down I can switch to Gemini. If I run out of credits (people use credits? I only use a monthly subscription) then I can find enough free credits around to get some basic work done. Increasingly, I could run a local model that would be good enough for some things and that'll become even better in the future. So no, I don't think these are any kind of valid arguments. Everyone relies on online services for their work these days, for banking, messaging, office work, etc. If there's some kind of catastrophe that breaks this, we're all screwed, not just the coders who rely on LLMs.
Meanwhile I’ve lost roughly a month from internet issues. My guess is you’re experience was unusual enough you felt the need to component where most developers who where less lucky or just remember more issues didn’t.
And here am I thinking that my life depends too much on the internet and the knowledge you can find on it. So if something big/extreme happens like nuclear war, major internet outage etc, I know nothing. No recipes, so basic medical stuff, like how to use antibiotics, electronics knowledge, whatever. I don't have any books with stuff like that like my parents used to.
I have seen some examples of backed up Wikipedia for offline usage and local llms etc and am thinking of implementing something as a precaution for these extreme events.
> And over that time I've worked in many places around the world, developing countries, tropical islands, small huts on remote mountains. And I've lost maybe a day of work because of connectivity issues. I've been deep in a rainforest during a monsoon and still had 4g connection.
I consider it more or less immoral to be expected to use the Internet for anything other than retrieving information from others or voluntarily sharing information with others. The idea that a dev environment should even require finicky configuration to allow for productive work sans Internet appalls me. I should only have to connect in order to push to / pull from origin, deploy something or acquire build tools / dependencies, which should be cached locally and rarely require any kind of update.
Same thing you do if AWS goes down. Same thing we used to do back in the desktop days when the power went out. Heck one day before WFH was common we all got the afternoon off 'cause the toilets were busted and they couldn't keep 100 people in an office with no toilets. Stuff happens. And if that's really not acceptable, you invest in solutions with the understanding that you're dumping a lot of cash into inefficient solutions for rare problems.
Why wouldn't these tools be available suddenly? Once you answer the question, the challenge then becomes mitigating that situation rather than doing things the old way. Like having backup systems, SLAs from network and other providers, etc.
Actually, the last thing you probably want is somebody reverting back to doing things the way we did them 20 years ago and creating a big mess. Much easier to just declare an outage and deal with it properly according to some emergency plan (you do have one, right?).
CI/CD are relatively new actually. I remember doing that stuff by hand. I.e. I compiled our system on my Desktop system, created a zip file, and then me and our operations department would use an ISDN line to upload the zip file to the server and "deploy" it by unzipping it and restarting the server. That's only 23 years ago. We had a Hudson server somewhere but it had no access to our customer infrastructure. There was no cloud.
I can still do that stuff if I need to (and I sometimes do ;-) ). But I wouldn't dream of messing with a modern production setup like that. We have CI/CD for a reason. What if CI/CD were to break? I'd fix it rather than adding to the problem by panicking and doing things manually.
I am not convinced of the wonderfulness, because the study implies that AI does not improve task completion time but does reduce programmer's comprehension when using a new library.
On device models (deepseek-coder, etc) are very good // better than the old way of using stack overflow on the internet. I have been quite productive on long haul flights without internet!
You're an engineer, your goal is to figure stuff out using the best tools in front of you
Humans are resilient, they reliably perform (and throw great parties) in all sorts of chaotic conditions. Perhaps the thing that separates us most from AI is our ability to bring out our best selves when baseline conditions worsen
I know this gets asked all the time, but what is your preferred workflow when using local models? I was pretty deep into it early on, with Tabby and Continue.dev, but once I started using Claude Code with Opus it was hard to go back. I do the same as you, and still use them on flights and whatnot, but I think my implementation could be improved.
The tools are going to ~zero (~ 5 years). The open source LLM's are here. No one can put them back or take them down. No internet, no problem. I don't see a long term future in frontier llm companies.
What I don't get is, how are these free LLMs getting funded? Who is paying $20-100 million to create an open weights LLM? Long term why would they keep doing it?
This is the argument that people used to fight against rich customized IDEs like emacs for decades. What if you need to ssh into a machine that only has baseline vi in an emergency?
I'll happily optimize my life for 99.999% of the time.
If the Internet is down for a long time, I've got bigger problems anyway. Like finding food.
Yeah! I use JetBrains AI assistant sometimes, which suddenly showing only blank window, nothing else. So, not getting anything out of it. But I can see my credits are being spent!
IF I was totally dependent on it, I would be in trouble. Fortunately I am not.
What good would being able to “build my software” without internet access unless I’m building software for a disconnected desktop? Exactly what am I going to do with it? How am I going to get to my servers?
Or your business gets flagged by an automated system for dubious reasons with no way to appeal. It's the old story of big tech: they pretend to be on your side first, but their motives are nefarious.
People used to and still say the same thing about GPS. As these systems mature they stay up and become incorporated into our workflows. The implication in the case of GPS was that navigating on your own is not a very critical task anymore. Correspondingly the implication here is that software design and feature design are more important than coding or technical implementation. Similar to Google, it's more important that you know how and what to ask for rather than be able to generate it yourself.
That reminds me of when teachers would say: what if you're without a calculator? And yet we all have smartphones in our pockets today with calculators.
> That reminds me of when teachers would say: what if you're without a calculator? And yet we all have smartphones in our pockets today with calculators.
Your teachers had the right goal, but a bad argument. Learning arithmetic isn't just about being able to do a calculation. It's about getting your brain comfortable with math. If you always have to pull out a goddamn calculator, you'll be extremely limited.
Trust me, elementary-age me was dumb to not listen to those teachers and to become so calculator-dependent.
Having a deep intuition about what the calculator is doing is the skill we were actually being taught. Teachers don't know always understand why things are being taught.
And yet calculating your shopping expenses to prevent getting screwed by buggy vending machines, or quickly making rough estimations at your work, is as useful as ever. Tell me how you can learn calculus and group theory, when you skipped primary school math.
> none of us could write a library call without consulting online sources.
I use SO quite often, but it is for questions I would otherwise consult other people, because I can't figure it out short of reverse-engineering something. For actual documentation man pages and info documents are pretty awesome. Honestly I dread leaving the world of libraries shipped with my OS vendor, because the quality of documentation drops fast.
I rely on the internet just as much as the rest of you. When that goes down, I crack out man pages, and the local copy of the documentation I can build from source code comments, and (after a 5-minute delay while I figure out how to do that) I'm back to programming. I'm probably half as quick, but I'm also learning more (speeding me up when the internet does come back on), so overall it's not actually time lost.
Well, you're supposed to pay for the Platinum Pro Gold Deluxe package which includes priority support with an SLA so that six months down the road you get a one month credit for the outage that destroyed your business.
I invested in a beefy laptop that can run Qwen Coder locally and it works pretty good. I really think local models are the future, you don’t have to worry about credits or internet access so much.
I think you laid out why so much mobey is being pressed into this: its digital crack and if they can addict enough businesses, they have subscription moats. Oraclification.
> This is all wonderful and all but what happens when these tools aren't available - you lose internet connection or the agent is misconfigured or you simply ran out of credits. How would someone support their business / software / livelihood?
This is why I suggest developers use the free time they gain back writing documentation for their software (preferably in your own words not just AI slop), reading official docs, sharpening your sword, learning design patterns more thoroughly. The more you know about the code / how to code, the more you can guide the model to pick a better route for a solution.
I'm seeing things that are seriously alarming though. Claude can now write better documentation and document things 95% there (we're building a set of MCP tools and API end-points for a large enterprise..) - Claude is already either writing code or fixing bugs or suggesting fixes. We have a PM, who has access to both React and API projects, on our team who saw one of the services return 500; they used Claude to pinpoint the bug to exact database call and suggest a fix. So now, it's quite common for PMs to not only post bugs but also "suggested fixes" from the agents. In a not so distant future, developers here will be simply redundant since PM can just use Claude to code and support the entire app. Right now, they still rely on us for support and deployments but that could go away too.
At most places I've worked, we can still get things done when AWS/GCP/Azure/OCI are down. For my own selfhosted work, I'm more self-reliant. But I'm aware there are some companies who do 100% of their work within AWS/GCP/Azure/OCI and are probably 100% down when they go down. That's a consequence of how they decided to architect their apps, services and infrastructure.
How would you answer the same question about water or electricity?
Your pizza restaurant is all wonderful and all but what happens when the continual supply of power to the freezer breaks? How will you run your restaurant then?
> This is all wonderful and all but what happens when these tools aren't available - you lose internet connection or the agent is misconfigured or you simply ran out of credits.
i would work on the hundreds of non-coding tasks that i need to do. or just not work?
I think this is where current senior engineers have an advantage, like I felt when I was a junior that the older guys had an advantage in understanding the low level stuff like assembly and hardware. But software keeps moving forward - my lack of time coding assembly by hand has never hindered my career. People will learn what they need to learn to be productive. When AI stops working in a given situation, people will learn the low level detail as they need to. When I was a junior I learned a couple of languages in depth, but everything since has been top down, learn-as-i-need to. I don't remember everything I've learned over 20 years software engineering, and the forgetting started way before my use of AI. It's true that conceptual understanding is necessary, but everyone's acting like all human coders are better than all AI's, and that is not the case. Poorly architected, spaghetti code existed way before LLM's.
> But software keeps moving forward - my lack of time coding assembly by hand has never hindered my career.
Well, yeah. You were still (presumably) debugging the code you did write in the higher level language.
The linked article makes it very clear that the largest decline was in problem solving (debugging). The juniors starting with AI today are most definitely not going to do that problem-solving on their own.
I want to compliment Anthropic for doing this research and publishing it.
One of my advantages(?) when it comes to using AI is that I've been the "debugger of last resort" for other people's code for over 20 years now. I've found and fixed compiler code generation bugs that were breaking application code. I'm used to working in teams and to delegating lots of code creation to teammates.
And frankly, I've reached a point where I don't want to be an expert in the JavaScript ORM of the month. It will fall out of fashion in 2 years anyway. And if it suddenly breaks in old code, I'll learn what I need to fix it. In the meantime, I need to know enough to code review it, and to thoroughly understand any potential security issues. That's it. Similarly, I just had Claude convert a bunch of Rust projects from anyhow to miette, and I definitely couldn't pass a quiz on miette. I'm OK with this.
I still develop deep expertise in brand new stuff, but I do so strategically. Does it offer a lot of leverage? Will people still be using it on greenfield projects next year? Then I'm going to learn it.
So at the current state of tech, Claude basically allows me to spend my learning strategically. I know the basics cold, and I learn the new stuff that matters.
> my lack of time coding assembly by hand has never hindered my career.
I'd kinda like to see this measured. It's obviously not the assembly that matters for nine-9s of jobs. (I used assembly language exactly one time in my career, and that was three lines of inline in 2003.) But you develop a certain set of problem-solving skills when you code assembly. I speculate, like with most problem-solving skills, it has an impact on your overall ability and performance. Put another way, I assert nobody is worse for having learned it, so the only remaining question is, is it neutral?
> everyone's acting like all human coders are better than all AI's
I feel like the sentiment here on HN is that LLMs are better than all novices. But human coders with actual logical and architectural skills are better than LLMs. Even the super-duper AI enthusiasts talk about controlling hoards of LLMs doing their bidding--not the other way around.
Being able to read assembly has helped me debug. You don't have to write it but you have to be able to write it. The same applies to manual transmissions and pocket calculators.
thats fair enough but reading assembly is such a pain in the ass... it was exciting for the first 10 minutes of my life, but now, if i ever got to that point, i will 100% copy-paste the listing to chatgpt with "hey, can you see anything sketchy?"
An important aspect of this for professional programmers is that learning is not something that happens as a beginner, student or "junior" and then stops. The job is learning, and after 25 years of doing it I learn more per day than ever.
How old are you? At 39 (20 years of professional experience) I've forgotten more things in this field than I'm comfortable with today. I find it a bit sad that I've completely lost my Win32 reverse engineering skills I had in my teens, which have been replaced by nonsense like Kubernetes and aligning content with CSS Grid.
And I must admit my appetite in learning new technologies has lessened dramatically in the past decade; to be fair, it gets to a point that most new ideas are just rehashing of older ones. When you know half a dozen programming languages or web frameworks, the next one takes you a couple hours to get comfortable with.
That's one of several possibilities. I've reached a different steady state - one where the velocity of work exceeds the rate at which I can learn enough to fully understand the task at hand.
But just think, there's a whole new framework that isn't better but is trendy. You can recycle a lot of your knowledge and "learn new things" that won't matter in five years. Isn't that great?
I write cards and quizzes for all kind of stuff, and I tend to retain it for years after having it practiced with the low friction of spaced repetition.
I worked as an "advisor" for programmers in a large company. Our mantra there was that programming and development of software is mainly acquiring knowledge (ie learning?).
One take-away for us from that viewpoint was that knowledge in fact is more important than the lines of code in the repo. We'd rather lose the source code than the knowledge of our workers, so to speak.
Another point is that when you use consultants, you get lines of codes, whereas the consultancy company ends up with the knowledge!
... And so on.
So, I wholeheartedly agree that programming is learning!
>One take-away for us from that viewpoint was that knowledge in fact is more important than the lines of code in the repo. We'd rather lose the source code than the knowledge of our workers, so to speak.
Isn't this the opposite of how large tech companies operate? They can churn develops in/out very quickly, hire-to-fire, etc... but the code base lives on. There is little incentive to keep institutional knowledge. The incentives are PRs pushed and value landed.
It can be I guess, but I think it's more about solving problems. You can fix a lot of peoples' problems by shipping different flavors of the same stuff that's been done before. It feels more like a trade.
People naturally try to use what they've learned but sometimes end up making things more complicated than they really needed to be. It's a regular problem even excluding the people intentionally over-complicating things for their resume to get higher paying jobs.
Have you been nothing more than a junior contributor all this time? Because as you mature professionally your knowledge of the system should also be growing
One of the nice things about the "dumber" models (like GPT-4) was that it was good enough to get you really far, but never enough to complete the loop. It gave you maybe 90%. 20% of which you had to retrace -- so you had to do 30% of the tough work yourself, which meant manually learning things from scratch.
The models are too good now. One thing I've noticed recently is that I've stopped dreaming about tough problems, be it code or math. The greatest feeling in the world is pounding your head against a problem for a couple of days and waking up the next morning with the solution sketched out in your mind.
I don't think the solution is to be going full natty with things, but to work more alongside the code in an editor, rather than doing things in CLI.
The big issue I see coming is that leadership will care less and less about people, and more about shipping features faster and faster. In other words, those that are still learning their craft are fucked up.
The amount of context switching in my day-to-day work has become insane. There's this culture of “everyone should be able to do everything” (within reason, sure), but in practice it means a data scientist is expected to touch infra code if needed.
Underneath it all is an unspoken assumption that people will just lean on LLMs to make this work.
You still have the system design skills, and so far, LLMs are not that good in this field.
They can give plausible architecture but most of the time it’s not usable if you’re starting from scratch.
When you design the system, you’re an architect not a coder, so I see no difference between handing the design to agents or other developers, you’ve done the heavy lifting.
In that perspective, I find LLMs quite useful for learning. But instead of coding, I find myself in long sessions back and forth to ask questions, requesting examples, sequence diagrams .. etc to visualise the final product.
I see this argument all the time, and while it sounds great on paper (you're an architect now, not a developer) people forget (or omit?) that a product needs far fewer architects than developers, meaning the workforce gets in fact trimmed down thanks to AI advancements.
Idk i very much feel like Claude Code only ever gets me really far, but never there. I do use it a fair bit, but i still write a lot myself, and almost never use its output unedited.
For hobby projects though, it's awesome. It just really struggles to do things right in the big codebase at work.
> The greatest feeling in the world is pounding your head against a problem for a couple of days and waking up the next morning with the solution sketched out in your mind.
And then you find out someone else had already solved it. So might as well use the Google 2.0 aka ChatGPT.
Well, this is exactly the problem. This tactic works until you get to a problem that nobody has solved before, even if it's just a relatively minor one that no one has solved because no one has tried to because it's so specific. If you haven't built up the skills and knowledge to solve problems, then you're stuck.
But to understand the solution from someone else, you would have to apply your mind to understand the problem yourself. Transferring the hard work of thinking to GPT will rob you of the attention you will need to understand the subject matter fully. You will be missing insights that would be applicable to your problem. This is the biggest danger of brain rot.
How is that a drawback? You still solved it, you learned a lot, and you can actually discuss approaches with the other one, because you actually understood the problem domain.
This is what I am thinking about this morning. I just woke up, made a cup of coffee, read the financial news, and started exploring the code I wrote yesterday.
My first thought was that I can abstract what I wrote yesterday, which was a variation of what I built over the previous week. My second thought was a physiological response of fear that today is going to be a hard hyper focus day full of frustration, and that the coding agents that built this will not be able to build a modular, clean abstraction. That was followed by weighing whether it is better to have multiple one off solutions, or to manually create the abstraction myself.
I agree with you 100 percent that the poor performance of models like GPT 4 introduced some kind of regularization in the human in loop coding process.
Nonetheless, we live in a world of competition, and the people who develop techniques that give them an edge will succeed. There is a video about the evolution of technique in the high jump, the Western Roll, the Straddle Technique, and finally the Fosbury Flop. Using coding agents will be like this too.
I am working with 150 GB of time series data. There are certain pain points that need to be mitigated. For example, a different LLM model has to be coerced into analyzing or working with the data from a completely different approach in order to validate. That means instead of being 4x faster, each iteration is 4x faster, and it needs to be done twice, so it still is only 2x faster. I burned $400 in tokens in January. This cannot be good for the environment.
Timezone handling always has to be validated manually. Every exploration of the data is a train and test split. Here is the thing that hurts the most. The AI coding agents always show the top test results, not the test results of the top train results. Rather than tell me a model has no significant results, it will hide that and only present the winning outliers, which is misleading and, like the OP research suggests, very dangerous.
A lot of people are going to get burned before the techniques to mitigate this are developed.
Overfitting has always been a problem when working with data. Just because the barrier of entry for time series work is much lower does not mean that people developing the skill, whether using old school tools like ARIMA manually or having AI do the work, escape the problem of overfitting. The models will always show the happy, successful looking results.
Just like calculators are used when teaching higher math at the secondary level so basic arithmetic does not slow the process of learning math skills, AI will be used in teaching too. What we are doing is confusing techniques that have not been developed yet with not being able to acquire skills. I wrack and challenge my brain every day solving these problems. As millions of other software engineers do as well, the patterns will emerge and later become the skills taught in schools.
> We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average.
> On average, participants in the AI group finished about two minutes faster, although the difference was not statistically significant. There was, however, a significant difference in test scores: the AI group averaged 50% on the quiz, compared to 67% in the hand-coding group
It's good that there's some research into this - to confirm what is generally obvious to anyone who studied anything. You have to think about what you are doing, write things by hand, use the skill to improve and retain it.
Common example here is learning a language. Say, you learn French or Spanish throughout your school years or on Duolingo. But unless you're lucky enough to be amazing with language skills, if you don't actually use it, you will hit a wall eventually. And similarly if you stop using language that you already know - it will slowly degrade over time.
Go Anthropic for transparency and commitment to science.
Personally, I’ve never been learning software development concepts faster—but that’s because I’ve been offloading actual development to other people for years.
The loss of competency seems pretty obvious but it's good to have data. What is also interesting to me is that the AI assisted group accomplished the task a bit faster but it wasn't statistically significant. Which seems to align with other findings that AI can make you 'feel' like you're working faster but that perception isn't always matched by the reality. So you're trading learning and eroding competency for a productivity boost which isn't always there.
This is up there with believing tobacco companies health "research" from the 30s, 40s, 50s, 60s, 70s, 80s, and 90s.
> We found that using AI assistance led to a statistically significant decrease in mastery. On a quiz that covered concepts they’d used just a few minutes before, participants in the AI group scored 17% lower than those who coded by hand, or the equivalent of nearly two letter grades. Using AI sped up the task slightly, but this didn’t reach the threshold of statistical significance.
This also echoes other research from a few years ago that had similar findings: https://news.ycombinator.com/item?id=46822158
But people like that they aren't shying away from negative results and that builds some trust. Though let's not ignore that they're still suggesting AI + manual coding.
But honestly, this sample size is so small that we need larger studies. The results around what is effective and ineffective AI usage is a complete wash with n<8.
Also anyone else feel the paper is a bit sloppy?
I mean there's a bunch of minor things but Figure 17 (first fig in the appendix) is just kinda wild. I mean there's trivial ways to solve the glaring error. The more carefully you look at even just the figures in the paper the more you say "who the fuck wrote this?" I mean like how the fuck do you even generate Figure 12? The numbers align with the grids but boxes are shifted. And Figure 16 has experience levels shuffled for some reason. And then there are a hell of a lot more confusing stuff you'll see if you do more than a glance...
My hypothesis is that the AI users gained less in coding skill, but improved in spec/requirement writing skills.
But there’s no data, so it’s just my speculation. Intuitively, I think AI is shifting entry level programmers to focus on expressing requirements clearly, which may not be all that bad of a thing.
We're definitely getting better at writing specs. The issue is the labor bottleneck is competent senior engineers, not juniors, not PMs, not box-and-arrow staff engineers.
> I think AI is shifting entry level programmers to focus on expressing requirements clearly
This is what the TDD advocates were saying years ago.
Dramatically improved Jira usage -- better, more descriptive tickets with actionable user stories and clearly expressed requirements. Dramatically improved github PRs. Dramatically improved test coverage. Dramatically improved documentation, not just in code but in comments.
Basically all _for free_, while at the same time probably doubling or tripling our pace at closing issues, including some issues in our backlog that had lingered for months because they were annoying and nobody felt like working on them, but were easy for claude to knock out.
I wonder if we're going to have a future where the juniors never gain the skills and experience to work well by themselves, and instead become entirely reliant on AI, assuming that's the only way
If you sucked before using AI you are going to suck with AI. The compounded problem there is that you won't see just how bad you suck at what you do, because AI will obscure your perspective through its output, like an echo chamber of stupid. You are just going to suck much faster and feel better about it. Think of it as steroids for Dunning-Kruger.
https://www.youtube.com/shorts/0LeJ6xn35gc
https://www.youtube.com/shorts/vXecG_KajLI
This.
That's not what the study says. It says that most users reflect your statement while there is a smaller % that benefits and learns more and faster.
Generalizations are extremely dangerous.
What the article says simply reflect that most people don't care that much and default to the path of least resistance, which is common every day knowledge, but we very well know this does not apply to everyone.
> Among participants who use AI, we find a stark divide in skill formation outcomes between high-scoring interaction patterns (65%-86% quiz score) vs low-scoring interaction patterns (24%-39% quiz score). The high scorers only asked AI conceptual questions instead of code generation or asked for explanations to accompany generated code; these usage patterns demonstrate a high level of cognitive engagement.
This is very much my experience. AI is incredibly useful as a personal tutor
That's not what the study says nor it is capable of credibly making that claim. You are reasoning about individuals in an RCT where subjects did not serve as their own control. The high performers in the treatment group may have done even better had they been in the control and AI is in fact is slowing them down.
You don't know which is true because you can't know because of the study design. This is why we have statistics.
Now, imagine a scenario of a typical SWE in todays or maybe not-so-distant future: the agents build your software, you simply a gate-keeper/prompt engineer, all tests pass, you're now doing a production deployment at 12am and something happens but your agents are down. At that point, what do you do if you haven't build or even deployed the system? You're like a L1 support at this point, pretty useless and clueless when it comes to fully understanding and supporting the application .
So you know what do, what I've been doing for about a decade, if the internet goes down? I stop working. And over that time I've worked in many places around the world, developing countries, tropical islands, small huts on remote mountains. And I've lost maybe a day of work because of connectivity issues. I've been deep in a rainforest during a monsoon and still had 4g connection.
If Anthropic goes down I can switch to Gemini. If I run out of credits (people use credits? I only use a monthly subscription) then I can find enough free credits around to get some basic work done. Increasingly, I could run a local model that would be good enough for some things and that'll become even better in the future. So no, I don't think these are any kind of valid arguments. Everyone relies on online services for their work these days, for banking, messaging, office work, etc. If there's some kind of catastrophe that breaks this, we're all screwed, not just the coders who rely on LLMs.
I am genuinely curious about your work lifestyle.
The freedom to travel anywhere while working sounds awesome.
The ability to work anywhere while traveling sounds less so.
Those still have limits, no? Or if there's a subscription that provides limitless access, please tell me which one it is.
cries on a Bavarian train
Actually, the last thing you probably want is somebody reverting back to doing things the way we did them 20 years ago and creating a big mess. Much easier to just declare an outage and deal with it properly according to some emergency plan (you do have one, right?).
CI/CD are relatively new actually. I remember doing that stuff by hand. I.e. I compiled our system on my Desktop system, created a zip file, and then me and our operations department would use an ISDN line to upload the zip file to the server and "deploy" it by unzipping it and restarting the server. That's only 23 years ago. We had a Hudson server somewhere but it had no access to our customer infrastructure. There was no cloud.
I can still do that stuff if I need to (and I sometimes do ;-) ). But I wouldn't dream of messing with a modern production setup like that. We have CI/CD for a reason. What if CI/CD were to break? I'd fix it rather than adding to the problem by panicking and doing things manually.
Take a look at how ridiculously much money is invested in these tools and the companies behind them. Those investments expect a return somehow.
https://boto3.amazonaws.com/v1/documentation/api/latest/inde...
I don’t need to comprehend “the library”. I need to know what I need to do and then look up the API call.
You're an engineer, your goal is to figure stuff out using the best tools in front of you
Humans are resilient, they reliably perform (and throw great parties) in all sorts of chaotic conditions. Perhaps the thing that separates us most from AI is our ability to bring out our best selves when baseline conditions worsen
I'll happily optimize my life for 99.999% of the time.
If the Internet is down for a long time, I've got bigger problems anyway. Like finding food.
I don't know about you, but I don't connect to the internet most of the time, and it makes more productive, not less.
IF I was totally dependent on it, I would be in trouble. Fortunately I am not.
... Why wouldn't you build software that works there?
As I understand things, the purpose of computers is to run software.
But more importantly, let's suppose your software does require an Internet connection to function.
Why should that imply a requirement for your development environment to have one?
Why should that imply a requirement for a code generation tool to have one?
Your teachers had the right goal, but a bad argument. Learning arithmetic isn't just about being able to do a calculation. It's about getting your brain comfortable with math. If you always have to pull out a goddamn calculator, you'll be extremely limited.
Trust me, elementary-age me was dumb to not listen to those teachers and to become so calculator-dependent.
We just really underestimate sentimentality in our society because it doesn't fit our self conception.
(I jest a bit, actually agree since turning assembly->compiled code is a tighter problem space than requirements in natural language->code)
You are at least a decade late to post fears about developers reliance on the internet. It was complete well before the LLM era
I use SO quite often, but it is for questions I would otherwise consult other people, because I can't figure it out short of reverse-engineering something. For actual documentation man pages and info documents are pretty awesome. Honestly I dread leaving the world of libraries shipped with my OS vendor, because the quality of documentation drops fast.
Deleted Comment
What happens when github goes down. You shrug and take a long lunch.
* all services are run at a loss and they increase price to the point the corp doesn’t want to pay for everyone any more.
* it turns out that our chats are used for corporate espionage and the corps get spooked and cut access
* some dispute between EU and US happens and they cut our access.
The solution’s having EU and local models.
This is why I suggest developers use the free time they gain back writing documentation for their software (preferably in your own words not just AI slop), reading official docs, sharpening your sword, learning design patterns more thoroughly. The more you know about the code / how to code, the more you can guide the model to pick a better route for a solution.
Your pizza restaurant is all wonderful and all but what happens when the continual supply of power to the freezer breaks? How will you run your restaurant then?
i would work on the hundreds of non-coding tasks that i need to do. or just not work?
what do you do when github actions goes down?
Well, yeah. You were still (presumably) debugging the code you did write in the higher level language.
The linked article makes it very clear that the largest decline was in problem solving (debugging). The juniors starting with AI today are most definitely not going to do that problem-solving on their own.
One of my advantages(?) when it comes to using AI is that I've been the "debugger of last resort" for other people's code for over 20 years now. I've found and fixed compiler code generation bugs that were breaking application code. I'm used to working in teams and to delegating lots of code creation to teammates.
And frankly, I've reached a point where I don't want to be an expert in the JavaScript ORM of the month. It will fall out of fashion in 2 years anyway. And if it suddenly breaks in old code, I'll learn what I need to fix it. In the meantime, I need to know enough to code review it, and to thoroughly understand any potential security issues. That's it. Similarly, I just had Claude convert a bunch of Rust projects from anyhow to miette, and I definitely couldn't pass a quiz on miette. I'm OK with this.
I still develop deep expertise in brand new stuff, but I do so strategically. Does it offer a lot of leverage? Will people still be using it on greenfield projects next year? Then I'm going to learn it.
So at the current state of tech, Claude basically allows me to spend my learning strategically. I know the basics cold, and I learn the new stuff that matters.
I'd kinda like to see this measured. It's obviously not the assembly that matters for nine-9s of jobs. (I used assembly language exactly one time in my career, and that was three lines of inline in 2003.) But you develop a certain set of problem-solving skills when you code assembly. I speculate, like with most problem-solving skills, it has an impact on your overall ability and performance. Put another way, I assert nobody is worse for having learned it, so the only remaining question is, is it neutral?
> everyone's acting like all human coders are better than all AI's
I feel like the sentiment here on HN is that LLMs are better than all novices. But human coders with actual logical and architectural skills are better than LLMs. Even the super-duper AI enthusiasts talk about controlling hoards of LLMs doing their bidding--not the other way around.
And I must admit my appetite in learning new technologies has lessened dramatically in the past decade; to be fair, it gets to a point that most new ideas are just rehashing of older ones. When you know half a dozen programming languages or web frameworks, the next one takes you a couple hours to get comfortable with.
I use remnote for that.
I write cards and quizzes for all kind of stuff, and I tend to retain it for years after having it practiced with the low friction of spaced repetition.
One take-away for us from that viewpoint was that knowledge in fact is more important than the lines of code in the repo. We'd rather lose the source code than the knowledge of our workers, so to speak.
Another point is that when you use consultants, you get lines of codes, whereas the consultancy company ends up with the knowledge!
... And so on.
So, I wholeheartedly agree that programming is learning!
Isn't this the opposite of how large tech companies operate? They can churn develops in/out very quickly, hire-to-fire, etc... but the code base lives on. There is little incentive to keep institutional knowledge. The incentives are PRs pushed and value landed.
Isn't large amounts of required institutional knowledge typically a problem?
People naturally try to use what they've learned but sometimes end up making things more complicated than they really needed to be. It's a regular problem even excluding the people intentionally over-complicating things for their resume to get higher paying jobs.
I could have sworn I was meant to be shipping all this time...
The models are too good now. One thing I've noticed recently is that I've stopped dreaming about tough problems, be it code or math. The greatest feeling in the world is pounding your head against a problem for a couple of days and waking up the next morning with the solution sketched out in your mind.
I don't think the solution is to be going full natty with things, but to work more alongside the code in an editor, rather than doing things in CLI.
The amount of context switching in my day-to-day work has become insane. There's this culture of “everyone should be able to do everything” (within reason, sure), but in practice it means a data scientist is expected to touch infra code if needed.
Underneath it all is an unspoken assumption that people will just lean on LLMs to make this work.
I also used to get great pleasure from the banging head and then the sudden revelation.
But that takes time. I was valuable when there was no other option. Now? Why would someone wait when an answer is just a prompt away.
They can give plausible architecture but most of the time it’s not usable if you’re starting from scratch.
When you design the system, you’re an architect not a coder, so I see no difference between handing the design to agents or other developers, you’ve done the heavy lifting.
In that perspective, I find LLMs quite useful for learning. But instead of coding, I find myself in long sessions back and forth to ask questions, requesting examples, sequence diagrams .. etc to visualise the final product.
For hobby projects though, it's awesome. It just really struggles to do things right in the big codebase at work.
And then you find out someone else had already solved it. So might as well use the Google 2.0 aka ChatGPT.
My first thought was that I can abstract what I wrote yesterday, which was a variation of what I built over the previous week. My second thought was a physiological response of fear that today is going to be a hard hyper focus day full of frustration, and that the coding agents that built this will not be able to build a modular, clean abstraction. That was followed by weighing whether it is better to have multiple one off solutions, or to manually create the abstraction myself.
I agree with you 100 percent that the poor performance of models like GPT 4 introduced some kind of regularization in the human in loop coding process.
Nonetheless, we live in a world of competition, and the people who develop techniques that give them an edge will succeed. There is a video about the evolution of technique in the high jump, the Western Roll, the Straddle Technique, and finally the Fosbury Flop. Using coding agents will be like this too.
I am working with 150 GB of time series data. There are certain pain points that need to be mitigated. For example, a different LLM model has to be coerced into analyzing or working with the data from a completely different approach in order to validate. That means instead of being 4x faster, each iteration is 4x faster, and it needs to be done twice, so it still is only 2x faster. I burned $400 in tokens in January. This cannot be good for the environment.
Timezone handling always has to be validated manually. Every exploration of the data is a train and test split. Here is the thing that hurts the most. The AI coding agents always show the top test results, not the test results of the top train results. Rather than tell me a model has no significant results, it will hide that and only present the winning outliers, which is misleading and, like the OP research suggests, very dangerous.
A lot of people are going to get burned before the techniques to mitigate this are developed.
Overfitting has always been a problem when working with data. Just because the barrier of entry for time series work is much lower does not mean that people developing the skill, whether using old school tools like ARIMA manually or having AI do the work, escape the problem of overfitting. The models will always show the happy, successful looking results.
Just like calculators are used when teaching higher math at the secondary level so basic arithmetic does not slow the process of learning math skills, AI will be used in teaching too. What we are doing is confusing techniques that have not been developed yet with not being able to acquire skills. I wrack and challenge my brain every day solving these problems. As millions of other software engineers do as well, the patterns will emerge and later become the skills taught in schools.
Ouch.
See also: https://news.ycombinator.com/item?id=46820924
> On average, participants in the AI group finished about two minutes faster, although the difference was not statistically significant. There was, however, a significant difference in test scores: the AI group averaged 50% on the quiz, compared to 67% in the hand-coding group
Common example here is learning a language. Say, you learn French or Spanish throughout your school years or on Duolingo. But unless you're lucky enough to be amazing with language skills, if you don't actually use it, you will hit a wall eventually. And similarly if you stop using language that you already know - it will slowly degrade over time.
Personally, I’ve never been learning software development concepts faster—but that’s because I’ve been offloading actual development to other people for years.