New GitHub Copilot research finds 'downward pressure on code quality'

I cancelled my subscription after 2 months because I was spending way too much mental effort going over all of the code vomit fixing all of the mistakes. And it was basically useless when trying to deal with anything non-trivial or anything to do with SQL (even when I frontloaded it with my entire schema).

It was much less effort to just write everything myself because I actually know what I want to write and fixing my own mistakes was easier than fixing the bot’s.

I weep for the juniors that will be absolutely crushed by this garbage.

ben_w · 2 years ago

> I cancelled my subscription after 2 months because I was spending way too much mental effort going over all of the code vomit fixing all of the mistakes. And it was basically useless when trying to deal with anything non-trivial or anything to do with SQL (even when I frontloaded it with my entire schema).

Good to know, that means I'm still economically useful.

I'm using ChatGPT rather than Copilot, and I'm surprised by how much it can do, but even so I wouldn't call it "good code" — I use it for JavaScript, because while I can (mostly) read JS code, I've spent the last 14 years doing iOS professionally and therefore don't know what's considered best practice in browser-land. Nevertheless, even though (usually) I get working code, I can also spot it producing bad choices and (what seems like) oddities.

> I weep for the juniors that will be absolutely crushed by this garbage.

Indeed.

You avoid the two usual mistakes I see with current AI, either thinking it's already game over for us or that it's a nothing-burger.

For the latter, I normally have to roll out a quote I can't remember well enough to google, that's something along the lines of "your dog is juggling, filing taxes, and baking a cake, and rather than be impressed it can do any of those things, you're complaining it drops some balls, misses some figures, and the cake recipe leaves a lot to be desired".

godelski · 2 years ago

> You avoid the two usual mistakes I see with current AI, either thinking it's already game over for us or that it's a nothing-burger.

This is always really surprising to me that it appears to be these two camps. Though what frustrates me is that if you suggest something in the middle people usually assume you're in the opposite camp than they are. It reminds me a lot of politics and I'm not sure why we're so resistant to nuance when our whole job is typically composed of nuance.

Though I'll point out, I think it is natural to complain about your juggling dog dropping balls or making mistakes on your taxes. That doesn't mean you aren't impressed. I think this response is increasingly common considering these dogs are sold as if they are super-human at these tasks. That's quite disappointing and our satisfaction is generally relative to expectations, not actual utility. If you think something is shit and it turns out to be just okay, you're happy and feel like you got a bargain. If you're expecting something to be great but it turns out to be just okay, you're upset and feel cheated. But these are different than saying the juggling dog is a useless pile of crap and will never be useful. I just want to make that clear, so we avoid my first paragraph.

philwelch · 2 years ago

> For the latter, I normally have to roll out a quote I can't remember well enough to google, that's something along the lines of "your dog is juggling, filing taxes, and baking a cake, and rather than be impressed it can do any of those things, you're complaining it drops some balls, misses some figures, and the cake recipe leaves a lot to be desired".

Something can be very impressive without actually being useful, but that still doesn’t make it useful. There’s no market for a working dog that does a bad job of baking cakes and filing taxes, while dogs that can retrieve game birds or tackle fleeing suspects are in high demand.

kcrwfrd_ · 2 years ago

As a senior frontend/javascript guy, I’m afraid that relying on ChatGPT/copilot for _current best practices_ is probably where it works the worst.

Oftentimes it will produce code that’s outdated. Or, it will output code that seems great, unless you have an advanced understanding of the browser APIs and behaviors or you thoroughly test it and realize it doesn’t work as you hoped.

But it’s pretty good at getting a jumpstart on things. Refining down to best practices is where the engineer comes in, which is what makes it so dicey in the hands of a jr dev.

noogle · 2 years ago

The criticism from the second camp stems from the fact that the WHOLE job is to not drop anything.

A fence with a hole is useless even if it's 99% intact.

A lot of human jobs, especially white collar, are about providing reassurance about the correctness of the results. A system that cannot provide that may be worse than useless since it creates noise, false sense of security and information load.

hinkley · 2 years ago

If there are patterns that are good, idiomatic, and mostly repeatable, we should be putting those into the standard library, not an AI tool.

What we have right now is a system to collect information about the sorts of problems developers want existing code to solve for them. We should embrace it.

agumonkey · 2 years ago

The value I found in my short trial of gpt3 was a bidirectional path finder.

Don't have me read 20 pages of docs just to integrate into a browser or a framework.. cutting the legwork essentially so I can keep my motivation and inspiration going.

Mawr · 2 years ago

> You avoid the two usual mistakes I see with current AI, either thinking it's already game over for us or that it's a nothing-burger.

A lot of the latter is caused by the former. It is a nothing burger compared to the shocking amount of hysteria on HN about AI putting programmers out of a job. You'd expect a programmer to know what his job is, but alas, apparently even programmers think of themselves as glorified typewriters.

antod · 2 years ago

>For the latter, I normally have to roll out a quote I can't remember well enough to google, that's something along the lines of "your dog is juggling, filing taxes, and baking a cake, and rather than be impressed it can do any of those things, you're complaining it drops some balls, misses some figures, and the cake recipe leaves a lot to be desired".

Not the quote, but there was a Farside cartoon along those lines where the dog was being berated for not doing a very good job mowing the lawn:

https://i.pinimg.com/originals/22/22/79/222279ceaa98f293e76e...

firecall · 2 years ago

Right now, for me, it’s fancy Auto Complete at best.

I get a Free subscription to it by using my kids EDU email accounts. Which is handy :)

But I absolutely would not pay for it.

I recall the last time I tried using the chat feature to do something, the code it produced wasn’t very useful and it referenced chapters from a book for further information.

It was clearly just regurgitating code from a book on the subject, and that just feels wrong to me.

At least give credit to the Authors and reference the book so I can go read the suggested chapters LOL

blibble · 2 years ago

> and therefore don't know what's considered best practice in browser-land.

wanting this is probably the worst possible use case for LLM code vomit

varispeed · 2 years ago

> I'm surprised by how much it can do, but even so I wouldn't call it "good code"

You can tell it the code is bad and how, a lot of the times it will correct it. For some bs code that you have to write it is a great time saver.

jrochkind1 · 2 years ago

> therefore don't know what's considered best practice in browser-land.

I myself also don't know what's considered best practice in Javascript generally (browser or server-side), even though I also have to write it occasionally -- but I wouldn't feel safe trusting that ChatGPT suggestions were likely to be model current best practices either.

On what are you basing your thinking that ChatGPT is more likely than not to be suggesting best practices? (Real question, I'm curious!)

Syntaf · 2 years ago

As with most things in life, moderation is key.

I find co-pilot primarily useful as an auto-complete tool to save keystrokes when writing predictable context driven code.

Writing an enum class in one window? Co-pilot can use that context to auto complete usage in other windows. Writing a unit test suite? Co-pilot can scaffold your next test case for you with a simple tab keystroke.

Especially in the case of dynamic languages, co-pilot nicely compliments your intellisense

choilive · 2 years ago

Yep, treat copilot like a really good autocomplete system and its wonderful. Saves a lot of time and typing even in languages that aren't known for having a lot of boilerplate.

Treat copilot to solve actual problems (y'know, the kind of stuff you are presumably paid to solve), and it falls completely flat.

thehours · 2 years ago

My experience is similar. This week I created a few DBT models and Co-Pilot saved a *ton* of keystrokes on the YML portion.

It needed some hand-holding in the early parts, but it was so satisfying to tab autocomplete entire blocks of descriptions and tests once it picked up context with my preferences.

chrismorgan · 2 years ago

> I weep for the juniors that will be absolutely crushed by this garbage.

This is the real danger of this sort of thing. When your Copilot or whatever are good enough that they replace what is vastly superior for purely economic reasons.

I wrote about this trend applied to the unfortunately inevitable doom of the voice acting industry in favour of text-to-speech models a couple of months ago, using my favourite examples of typesetting, book binding and music engraving: https://news.ycombinator.com/item?id=38491203.

But when it’s development itself that gets hollowed out like this, I’m not sure what the end state is, because it’s the developers who led past instances of supplanting. Some form of societal decline and fall doesn’t feel implausible. (That sentence really warrants expansion into multiple paragraphs, but I’m not going to. It’s a big topic.)

m3047 · 2 years ago

Yes, like desktop publishing compared to traditional printing. Its output is comparatively shitty but average people will settle for it on account of its democratization. My context was feature factory web dev versus low / no code rather than autocomplete versus solo author.

By democratization, I mean that it enables one-one where previously there was only one-many: instead of the inversion of experience where the same unique app experience is shared by millions, a technology allows the interface to be tailored for an audience of one or dozens: missing toes and fingers, color blindness, particularly difficult and unique operating conditions, etc. Given those unique constraints the mediocrity provides at least some preferable solution. The downside of this is that it sets the floor a lot lower, and people who would never have even tried or contemplated trying typesetting will dabble with desktop publishing to achieve their ends.

Somebody on here gifted me with the word "procrustean" and I've taken it and put it in my Minsky fish tank. There are many reasons to eschew the trusted experts model: somebody who has made heads for pins for twenty years is incontrovertably an expert, but who cares? Our uncanny valley appears to be only a local minimum.

[PS, nobody understood my point but I thank them for the honest feedback.]

thepasswordis · 2 years ago

Oh man this is the opposite of my experience!

Copilot has replaced almost all of the annoying tedious stuff, especially stuff like writing (simple) SQL queries.

“Parse this json and put the fields into the database where they belong” is a fantastic use case for copilot writing SQL.

(Yes I’m sure there’s an ORM plugin or some middleware I could write, but in an MVP, or a mock-up, that’s too much pre optimization)

JBorrow · 2 years ago

An ORM is not too much pre optimisation…

Yodel0914 · 2 years ago

When I've tried codepilot and similar tools, I've found them rather unimpressive. I assumed it was because I hadn't put the time in to learn how to make the best use of it, but maybe it's just that it's not very good.

On the other hand, I use ChatGPT (via the API) quite often, and it's very handy. For example, I wrote a SQL update that needed to touch millions of rows. I asked ChatGPT to alter the statement batch the updates, and then asked it to log status updates after each batch.

As another example, I was getting a 401 accessing a nuget feed from Azure DevOps - I asked ChatGPT what it could be and it not only told me, but gave me the yaml to fix it.

In both cases, this is stuff I could have done myself after a bit of research, but it's really nice to not have to.

l5870uoo9y · 2 years ago

This is the problem with using AI for generating SQL statements; it doesn't know the semantics of the your database schema. If you are still open for a solution, I recently deployed a solution[1] that combines AI, db schema and simple way train AI to know your database schema.

Essentially, you "like" correct (or manually corrected) generations and a vectorized version is stored and used in future similar generations. An example could be tell which table or foreign key is preferred for a specific query or that is should wrap columns in quotes.

From my preliminary tests it works well. I was able to consistently make it use correct tables, foreign keys and quotes on table/column name for case-sensitivity using only a couple of trainings. Will open a public API for that soon too.

[1]: https://www.sqlai.ai/

tilwidnk · 2 years ago

It's "a public API", not "an public API", because of the consonant rule.

I really worry that there are people out there who will anxiously mangle their company's data thinking what is being called AI, which doesn't exist yet, will save the day.

CrimsonRain · 2 years ago

Just copy paste the db/table schema in a comment at start of your file. Nothing else needed.

WithinReason · 2 years ago

I'm not using Copilot to write code, I use it for autocomplete. For that it's great.

zarzavat · 2 years ago

Copilot probably saved my career. I was getting occasional wrist pain due to keyboard overuse, but the autocomplete is so good that my keyboard use is a tiny fraction of what it was previously. Wrist pain gone.

Using Copilot is a skill though, you have to live with it and learn its limits and idiosyncrasies to get the most out of it.

valcron1000 · 2 years ago

100% agree. It usually autocompletes better than the language's LSP since it takes a lot of context into consideration. It's a godsend for repetitive tasks or trivial stuff.

atoav · 2 years ago

I never even started. On a vacation I tried to get a LLM to write me an interpolation function for two hours. I had a test data set and checks laid out. Not a single of the resulting algorithms passed all the checks, most didn't even do what I asked for and a good chunk didn't even run.

LLMs give you plausible text. That does not mean it is logically coherent or says what it should.

keeganpoppen · 2 years ago

it's crazy how amazing it is sometimes for certain things, including comments (and including my somewhat pithy style of comment prose), and how incredible and thorough the wastes of time when it autosuggests some function that is plausible, defined, but yet has some weird semantics such that it subtly ruins my life until i audit every line of code i've written (well, "written", i suppose). i finally disabled it, but doing so _did_ make me kind of sad-- it is nice for the easy things to be made 25x easier, but, for programming, not at the expense of making the hard stuff 5x harder (note i didn't say 10x, nor 100x. 5x). it's not that it's that far away from being truly transformative, it's just that the edge cases are really, really rough because for it to be truly useful you have to trust it pretty much completely & implicitly, and i've just gotten snakebitten in the most devious ways a handful of times. an absolute monster of / victim of the pareto principle, except it makes the "90%" stuff 1.5x easier and the "10%" stuff 5x harder (yes, i know i haven't been using my "k%"s rigorously) (and for those keeping score at home, that adds up to making work 10% harder net, which i'd say is about right in my personal experience).

highlights: "the ai" and i collaboratively came up with a new programming language involving defining a new tag type in YAML that lets one copy/paste from other (named) fragments of the same document (as in: `!ref /path/to/thing to copy`) (the turing completeness comes from self-referential / self-semi-overlapping references (e.g. "!ref /name/array[0:10]`) where one of the elements thus referred-to is, itself, a "!ref" to said array).

lowlights: as already alluded to, using very plausible, semi-deprecated API functions that either don't do what you think they do, or simply don't work the way one would think they do. this problem is magnified by googling for said API functions only to find cached / old versions of API docs from a century ago that further convince you that things are ok. nowadays, every time i get any google result for a doc page i do a little ritual to ensure they are for the most recent version of the library, because it is absolutely insane how many times i've been bitten by this, and how hard.

dehrmann · 2 years ago

> It was much less effort to just write everything myself because I actually know what I want to write and fixing my own mistakes was easier than fixing the bot’s.

Echoing this, it takes longer to read code than to write it, so generally, if you know what you want to write and it's non-trivial, you'll spend more time groking AI-written code for correctness than writing it from scratch.

orbisvicis · 2 years ago

I've been using Bing's GPT-4 to learn Fortran for about a week. Well, I'm mainly reading a book which is project-oriented so I'm often researching topics which aren't covered in enough detail or covered later. I think this mixed-mode approach is great for learning. Since Fortran's documentation is sparse and Google's results are garbage the output of GPT4 helps cut through a lot of the cruft. Half the time it teaches me, the rest I'm teaching it and correcting its mistakes. I certainly won't trust it for anything half complicated but I think it does a good job linking to trustworthy supporting sources, which is how I learned how to cast assumed-rank arrays using c_loc/c_f_pointer and sequence-association using (*) or (1). It's great for learning new concepts and I imagine it would be great for pair-coding in which it suggests oversights to your code. However I can't imagine depending on it to generate anything from scratch. What's surprising is how little help the compiler is - about as bad a resource as SEO junk. I'm used to "-Wall -Werror" from C, but so many gfortran warnings are incorrect.

pklausler · 2 years ago

Please be sure to report poor warnings to the gfortran developers -- it's generally a great compiler, and you can help keep it great.

A problem with Fortran compiler error and warning messages is that Fortran is largely a legacy language at this point, and most Fortran code hitting the compilers has already had its errors shaken out. New code, and especially new code from new Fortran users, is somewhat more rare -- so those error and warning checks are a part of the compiler that doesn't get as much exercise as one would like.

ec109685 · 2 years ago

If you used copilot in the beginning (and I think still with some plans), it was only GPT 3.5.

Likely you’d get much better results with GPT-4.

theshrike79 · 2 years ago

The difference in output code quality in 3.5 vs 4 is staggering - just with using regular old ChatGPT.

_teyd · 2 years ago

I think there is a sweet spot where if you're junior on the cusp of intermediate it can help you because you know enough to reject the nonsense, but it can point you in the right direction. Similar to if you need to implement a small feature in a language you don't know, but basically know what needs to get done.

I've definitely seen juniors just keep refining the garbage until it manages to pass a build and then try to merge it, though, and using it that way just sort of makes you a worse programmer because you don't learn anything and it just makes you more dependent on the bot. Companies without good code reviews are just going to pile this garbage on top of garbage.

Sammi · 2 years ago

I find copilot to be most useful as an orm. It can vomit out bindings between sql and your code so incredibly fast.

It's really bad at doing anything novel and complex, so don't use it for that. But doing trivial stuff with tech you are new to is great. If you're new to sql then it can write you a decent table schema with decent indexes and give you the correct insert/update/select queries. It can even do simple joins. But don't venture into some complex nested stuff. Don't.

marcyb5st · 2 years ago

As other said I use copilot or similar for scaffolding/boilerplate code. But you are right, reading almost correct code that you are unfamiliar with is much more demanding than fixing stuff I got wrong to begin with.

crabmusket · 2 years ago

> It was much less effort to just write everything myself because I actually know what I want to write

This aligns with my observations. I don't use Copilot etc. but the other devs on my small team do. I've observed that I'm generally a faster and more confident type and coder - not knocking their skills, I'm just more experienced, and also spent my teens reading and writing a lot.

I've seen that it helps them in cases where they're less certain what they're doing, but also when they know what they're doing and it's quicker about it.

mewpmewp2 · 2 years ago

For me I know what I want to write and it seems that Copilot also knows what I want to write, so as an auto complete it just works out for me. Most of the time code I want to write is in my head, I just need to be able to quickly vomit it out and typing speed is the bottleneck.

I am also able to intuitively predict that it is going to vomit out exactly what I want.

E.g. I know ahead of time what the 10 lines it will give me are.

teaearlgraycold · 2 years ago

Really? I've been doing web dev as a hobby for 20 years and professionally for 6 or 7 years. It's super helpful for me giving how much boilerplate there is to write and how much of the training set is for websites. Any time I try to get it to write non-trivial SQL or TypeScript types it fails hard. But it's still a nice time saver for writing tests, request handling, React boilerplate, etc.

louthy · 2 years ago

This is the problem.

As programmers we should be focusing effort on reducing boilerplate, so that it’s never needed again. Instead, we’ve created a boilerplate generator.

culi · 2 years ago

I've been using it since before the beta and I still do not understand why people have ever used it for multi-line suggestions. I only ever use it as a one-line autocomplete and it has done wonders for my productivity

cqqxo4zV46cp · 2 years ago

It sounds to me like you were getting it to do too much at once.

ringofchaos · 2 years ago

Depends on how you use it.

I use a similar vscode assistant bit only for shorter code. I am able to complete code faster than an instructor on video.

arthur_sav · 2 years ago

For me it's useful for new languages that I'm not familiar with, saves a lot of googling time and looking up the docs.

godelski · 2 years ago

I have similar sentiment and looking at how mixed takes are, I think it depends on what you do. I write a lot of research code so I think it's unsurprising that GPT isn't too good here. But people I see that write code more in line with the "copy and paste from stackoverflow" style, get huge utility out of this. (This isn't a dis on that style, lots of work is repetitive and redundant).

So I changed how I use GPT (which I do through API. Much cheaper btw). I use it a lot like how I would use SO in the first place. Get outlines, understand how certain lines might work (noisy process here), generate generic chunks of code especially from modules I'm unfamiliar with. A lot of this can just be seen as cutting down time searching.

So, the most useful one: using it as a fuzzy search to figure out how to Google. This one is the most common pattern. Since everything on Google is so SEO optimized and Google clearly doesn't give a shit, I can ask GPT a question, get a noisy response that contains useful vernacular or keywords which I can then use to refine a Google search and actually filter out a decent amount of shit. I think people might read this comment and think that you should just build a LLM into Google, but no, what's going on is more complicated and requires the symbiosis. GPT is dumb, doesn't have context, but is good at being a lossy compression system. The whole reason this works is because I'm intelligent and __context aware__, and importantly, critical of relying on GPT's accuracy[0]. Much of this can't be easily conveyed to GPT and isn't just a matter of token length. So that said, the best way to actually improve this system is actually for Google to just get its shit together or some other search engine to replace them. Google, if you're listening, the best way you can make Google search better with LLMs is to: 1) stop enabling SEO bullshit, 2) throw bard into the side and have the LLM talk to you to help you refine a search. Hell, you can use a RL agent for 1 to just look how many times I back out from the links you send me or look at which links I actually use. Going to page 2 is a strong signal that you served shit.

[0] accuracy is going to highly depend on frequency of content. While they dedupe data for training, they don't do great semantic deduping (still an unsolved problem. Even in vision). So accuracy still depends on frequency and you can think of well known high frequency knowledge as having many different versions, or that augmentation is built in. You get lower augmentation rates with specific or niche expert knowledge as there's little baked in augmentation and your "test set" is much further from the distribution of training data.

balaji1 · 2 years ago

so its good my card that was auto-paying for the Copilot subscription expired

throwaway2990 · 2 years ago

You cancelled the tool because you didn’t know how to use it?

Dead Comment

I'm a junior, and I have Codeium installed in VSCode. I've found it very distracting most of the times, I don't really understand why so many people uses this kind of assistants.

I find stuff like Phind useful, in the sense that sometimes something happens that I don't understand, and 60% of the times Phind actually helps me to understand the problem. Like finding trivial bugs that I didn't spot because I'm tired, dumb, etc.

On the other hand, with Codeium, I guess it may be useful when you're just churning boilerplate code for some framework, but in my little expericence (writing scrapers and stupid data pipelines & vanilla JS + HTML/CSS) cycling through suggestions is very irritating, specially because many times it doesn't work. Most of the times for stupid reasons, like lacking an argument or something like that, but then it's time you have to spend debugging it.

Another problem I have is that I find there's a common style of JS which consist in daisy-chaining a myriad of methods and anonymous functions, and I really struggle with this. I like to break stuff into lines, name my functions and variables, etc. And so many times code suggestions follow this style. I guess it's what they've been trained on.

Codeium is supposed to learn from this, and sometimes it does, to be fair.

But what I worry the most is that, If I'm a junior and I let this assistants do the code for me ¿How the hell I'm supposed to learn? Because giving Phind context + questions helps me learn or gives me direction to go on find it by myself in the internet, but if the only thing I do is press tab, I don't know how the hell I'm supposed to learn.

I found a couple days ago that many people (including devs) are not using LLMs to get better but it's just a substitute of their effort. Isn't people afraid of this? Not because companies are going to replace you, but it's also a self-reflection issue.

Coding is not the passion of my life, addmitedly, but I like it. I like it because it helps me to make stuff happen and to handle complexity. If you can't understand what's happening you won't be able to make stuff happen and much less to spot when is complexity going to eat you.

jacquesm · 2 years ago

> Coding is not the passion of my life, addmitedly, but I like it.

It may not be the passion of your life but I haven't seen anybody articulate better (in recent memory) what they want to get out of coding and how they evaluate their tools. Keep at it, don't change and you'll go places, you are definitely on the right path.

withinboredom · 2 years ago

I think probably the best use of AI, so far, was when I went into a controller and told it to generate an openAPI spec ... and it got it nearly right. I only had to modify some of the models to reflect reality.

BUT (and this is key), I've hand-written so many API specs in my career that 1) I was able to spot the issues immediately, and 2) I could correct them without any further assistance (refining my prompt would have taken longer than simply fixing the models by hand).

For stuff where you know the domain quite well, it's amazing to watch something get done in 30s that you know would have taken you the entire morning. I get what you're saying though, I wouldn't consider asking the AI to do something I don't know how to do, though I do have many conversations with the AI about what I'm working on. Various things about trade-offs, potential security issues, etc. It's like having a junior engineer who has a PHD in how my language works. It doesn't understand much, but what it does understand, it appears to understand it deeply.

staunton · 2 years ago

> I wouldn't consider asking the AI to do something I don't know how to do

My experience has been the opposite so far. I benefit much more from such tools when I can easily check if something works correctly and would have to learn/look up a lot of easy and elementary stuff to do it from scratch.

For example, adding to some existing code in a language I don't know and don't have time or need to learn (I guess not many people are often in that situation). I get a lot of hints for what methods and libraries are available, I don't have to know the language syntax, for easy few-line snippets (that do standard things and which I can test separately) the first solution usually just works. This is deliberately passing on an opportunity for deeper and faster learning, which is a bad idea in general, but sometimes the speed trade-off is worth it.

On the other hand, for problems where I know how to solve them, getting some model to generate the solution I want (or at least one I'm happy with) tends to be more work than just doing it myself.

I probably could improve a lot in how I use the available tools. Haven't had that much opportunity yet to play with them...

mvdtnz · 2 years ago

> Another problem I have is that I find there's a common style of JS which consist in daisy-chaining a myriad of methods and anonymous functions, and I really struggle with this. I like to break stuff into lines, name my functions and variables, etc.

I think your whole comment is excellent but I just wanted to tell you, you're on the right track here. Certain developers, and in particular JS developers, love to chain things together for no benefit other than keeping it on one line. Which is no benefit at all. Keep doing what you're doing and don't let this moronic idiom infect your mind.

xanderlewis · 2 years ago

Sometimes making something a one-liner is itself a benefit for readability. Especially if you’re used to reading it. But admittedly it’s very easy (and can be tempting) to take it too far…

zeroonetwothree · 2 years ago

The downside of extra variables used only once is that as a reader of the code I have to think about whether they might be used again.

vjerancrnjak · 2 years ago

This is just another coding style. After 1-2 weeks you get used to whatever you're reading. Try it and you'll see.

It's the high-level code that can become an issue (structuring the state of your program, using dependency injection incorrectly, having a convoluted monad transformer stack, putting very specifically typed effects in your Reader etc.). If you make mistakes there, you will struggle to read, write and reuse code, and even then, not all is lost. If there's bad structure you can most often transform it into a good one. When there's no structure, that's a problem.

Seeing .map.filter becomes a quick pattern match. You know what's happening there. It does not matter if it's a named variable or just part of a long

    a.map
     .filter
     .reduce
     .map

chain.

I agree, if your goal is to hire a lot of people, then you might want a style that does not strain the pattern matching abilities too much. We can compare which style is the best for that.

Nothing stops you from extracting a sequence from a long chain into a function to reuse it elsewhere.

    pipe(
      object,
      map,
      filter,
      ...
    )

Many languages today allow declaring functions inside functions. I'd argue that in that case it's better you declare functions as close as possible to the place where you'll call them, which can be inside another function.

kromem · 2 years ago

While I can't speak to Codeium, you might want to try Copilot in a more mature codebase that reflects your style of composition.

The amazing part for me with the tech is when it matches my style and preferences - naming things the way I want them, correctly using the method I just wrote in place of repeating itself, etc.

I haven't used it much in blank or small projects, but I'd imagine I'd find it much less ideal if it wasn't so strongly biased towards how I already write code given the surrounding context on which it draws.

tpmoney · 2 years ago

The tool and design of the tool matters a lot. I've used Codeium in VSC and GH Copilot in Intellij, and the experience (and quality) of the GH + Intellij paring is much better than Codeium + VSC.

My biggest use for AI assistants has been speeding up test writing and any "this but slightly different" repetitive changes to a code base (which admittedly is also a lot of test writing). At least in intellij + GH, things like, a new parameter that now needs to be accounted for across multiple methods and files is usually a matter of "enter + tab" after I've manually typed out the first two or three variants of what I'm trying to do. Context gives it the rest.

In VSC with Codeium, the AI doesn't seem quite as up to snuff, and the plugin is written in such a way that its suggestions and the keys for accepting them seem to get in the way a lot. It's still helpful for repetitive stuff, but less so for providing a way of accomplishing a given goal.

-- fails because of duplicate column names (e.g. when creating a view) SELECT * FROM table_a JOIN table_b ON ... JOIN table_c ON ... ... -- this would solve my issue, if WITH_PREFIX did exist (or anything like it) SELECT table_a.* WITH_PREFIX 'table_a', table_b.* WITH_PREFIX 'table_b', table_c.* WITH_PREFIX 'table_c' FROM table_a JOIN table_b ON ... JOIN table_c ON ... ...

web3-is-a-scam · 2 years ago

havaloc · 2 years ago

Using GPT-4 has significantly enhanced the efficiency of my work. My focus is primarily on developing straightforward PHP CRUD applications for addressing problems in my day-to-day job. These applications are simple, and don't use frameworks and MVC structures, which makes the code generated by GPT-4, based on my precise instructions, easy to comprehend and usually functional right out of the prompt. I find if I listen to the users needs I can make something that addresses a pain point easily.

I often request modifications to code segments, typically around 25 lines, to alter the reporting features to meet specific needs, such as group X and total Y on this page. GPT-4 responds accurately to these requests. After conducting a quick QA and test, the task is complete. This approach has been transformative, particularly effective for low-complexity tasks and clear-cut directives.

This process reminds me of how a senior programmer might delegate: breaking down tasks into fundamental components for a junior coder to execute. In my case, GPT-4 acts as the junior programmer, providing valuable assistance at a modest cost of $20 per month. I happily pay that out of pocket to save myself time.

However, much like how a younger version of myself asked why we had to learn math if the calculator does it for us, I know understand why we do that. I think the same thing applies here. If you don't know the fundamentals, you won't be effective. If GPT-4 had been around when I learned to write PHP (don't @ me!), I probably wouldn't understand the fundamentals as well as I do. I have the benefit of learning how to do it before it was a thing, and then benefitting from the new tool being available.

I also don't find the code quality to be any less, if anything what it spits out is a bit more polished (sometimes!).

Yeah, a lot of times it has better code quality, but more subtle bugs than what I'd be prone to produce.

I think a lot of the criticisms are premature, and it's more a stumbling step forward with need for support from additional infrastructure.

Where's the linter integration so it doesn't spit out a result that won't compile? Where's the automatic bug check and fix for low hanging fruit errors?

What should testing look like or change around in a gen AI development environment?

In general, is there something like TDD or BDD that is going to be a better procedural approach to maximizing the gains to be had while minimizing the costs?

A lot of the past year or two has been dropping a significant jump and change in tech into existing workflows.

Like any tool, there's the capabilities of the tool itself and the experience of the one wielding it that come together to make the outcome.

The industry needs a lot more experience and wisdom around incorporation of gen AI in development before we'll realistically have a sense of its net worth. I'd say another 2-3 years at least - not because the tech will take that long to adapt, but because that's how long the humans will take to have sufficiently adapted.

hackernewds · 2 years ago

precisely. we are very lucky to be during the timeline where ChatGPT was released during our later years, that we didn't have to compete with auto created code during our learning formative years.

NoPicklez · 2 years ago

I can see both points.

But I agree with this one more so, I did programming as part of my Comp Sci degree and my job doesn't require any programming. I didn't particularly like programming and would end up with 20+ tabs of various questions being asked with most of my time spend finding an answer to my question having to troll through what was often the cesspool of stackoverflow.

But having a tool where I can ask it questions about my code, code in general is hugely beneficial. I can write a block of code, or have it write a block of code, then have it explain to me how it's meant to be working. If I don't understand a particular component I can contextually ask it more questions.

I appreciate the expectation of code quality is higher in production, but from a personal learning standpoint for a learner its great.

WanderPanda · 2 years ago

This sounds like motivated reasoning to me. Having an above-average personal tutor that doesn't get mad or tired 24/7 every time seems like a big multiplier

bee_rider · 2 years ago

Eh, you could say it about compilers, then optimizing compilers… unless they are on their way to the post-scarcity world, the next generation will figure out a way to take advantage of new tools. Sure, lots of things will change, but people will adapt.

I’d be more worried if I was somebody like Squarespace. When anybody can say “build me a neat looking website,” the business of selling templates looks rough.

elendee · 2 years ago

this is you though, as opposed to the new paradigm of coding that is threatening to be ushered in. "Generate code, test, fail, regenerate, test...". Without ever breaking down constituent parts.

I already worked with a team of 20 something's who were generating mountains of full stack spaghetti on top of the basic CRUD framework I built them.

There's lessening incentive to build your TODO app from scratch when you can generate an "MMO framework" in 60 seconds.

The same way I first used firebase 12 years ago before trying to learn the basics of relational, and it was years before I finally arrived at the basics.

therealdrag0 · 2 years ago

How do you interface with it? Are you pasting chunks of code into chat? Or just describing new code to write and then giving it feedback to rewrite it? Or something else?

danielovichdk · 2 years ago

When I look into the future, and I know that I really can't, one thing I really believe in is that there will be a shift in how quality will be perceived.

With all things around me there is a sense that technology is to be a saviour for many very important things - ev's, medicine, it, finance etc.

At the same time it is more and more clear to me that technology is used primarily to grow a market, government, country etc. But it does that by layering on top of already leaking abstractions. It's like solving a problem by only trying to solvent be its symptoms.

Quality has a sense of slowness to it which I believe will be a necessary feat, both due to the fact that curing symptoms will fall short and because I believe that the human species simply cannot cope with the challenges by constantly applying more abstractions.

The notion about going faster is wrong to me, mostly because I as a human being do not believe that quality is done by not understanding the fundamentals of a challenge, and by trying to solve it for superficial gains is simply unintelligent.

LLMs is a disaster to our field because it caters to the average human fallacy of wanting to reach a goal but without putting in the real work to do so.

The real work is of course to understand what it is that you are really trying to solve with applying assumptions about correctness.

Luckily not all of us is trying to move faster but instead we are sharpening our minds and tools while we keep re-learing the fundamentals and applying thoughtful decisions in hope to make quality that will stand the test of time.

jstummbillig · 2 years ago

> The real work is of course to understand what it is that you are really trying to solve with applying assumptions about correctness.

In how far do you think LLMs stand in the way of that?

My experience has been very much the opposite: Instead of holding the hard part of the process up by digging through messy apis or libraries, LLMs (at least, in their current form but I suspect that this will theoretically simply remain true) make it painfully obvious when my thinking about a task of any significance is not sound.

To get anywhere with a LLM, you need to write. To write, you have to think.

Very often I find the most beneficial part of the LLM-coding-process is a blended chat backlog that I can refer back to, consisting of me carefully phrasing what it is that I want to do, being poked by a LLM, and me through this process finding gaps and clarifying my thoughts at the same time.

I find this tremendously useful, specially when shaping the app early, to keep track of what I thought needed to be done and then later being able to reconsider if that is actually still the case.

peterashford · 2 years ago

This aligns with my thinking about the utility of LLMs: they're rubber duck programming as a service

I don't get so much from going over previous conversations, but needing to articulate a problem well enough to ask chatgpt a question is extremely useful. Far moreso than coming up with search phrases, I find.

weikju · 2 years ago

This is how I’ve been most successful so far at using LLMs. They help me poke as you say at the problem until a satisfying solution appears or until I have enough info to know what to look for.

norir · 2 years ago

There is an interview with the great jazz pianist Bill Evans (conducted by his brother) in which he muses that most amateur musicians make the mistake of overplaying. They go out to the club and hear a professional and they come home and try to approximate what the professional does. What they end up with is a confused mess with no foundation. He insists that you have to learn to be satisfied with doing the simple things and gradually building up a stronger foundation.

I think his insight applies nearly as well to using code generated by an ai.

bamboozled · 2 years ago

IKEA furniture is a great example of this. I build my own furniture and being around it is a much much nicer thing than some piece of cardboard from IKEA.but it seems like cost, soeed an convenience are the most important thing in peoples minds.

76SlashDolphin · 2 years ago

But the tradeoff of cost and convenience vs quality is everywhere in life. Most people (including me) do not have the time, money, nor (in my case most importantly) workspace to build their own furniture. IKEA and other budget furnishing companies are a perfect solution for people in this situation and I can buy a handmade piece of furniture if I ever feel that something is not up to quality.

sanroot99 · 2 years ago

Very well put, like what's the point of doing work of the art if the art don't accompany artist story of struggle, mental experience and creative expression while reaching to end form of his art. What ai model does is rob us all inate experience and give us only cream of end result, it's is like watching porn instead of forming real relationship with person to win the sex.

chiefalchemist · 2 years ago

> LLMs is a disaster to our field because it caters to the average human fallacy of wanting to reach a goal but without putting in the real work to do so.

It's a tool. It doesn't make sense to blame the tool. Is it the screwdriver's fault it gets used as a hammer? Or a murder weapon?

Used intelligently Copilot & Co can help. It can handle the boilerplate, the mundane and free up the human element to focus on the heavy lifting.

All that aside, it's early days. It's too early to pass judgement. And it seems unlikely it's going to go away.

dweinus · 2 years ago

The methodology seems to be: compare commit activity from 2023 to prior years, without any idea of how many involve Copilot. Then interpret those changes with assumptions. That seems a bit shakey.

Also: "The projections for 2024 utilize OpenAI's gpt-4-1106-preview Assistant to run a quadratic regression on existing data." ...am I to understand they asked gpt to do a regression on the data (4 numbers) rather than running a simple regression tool (sklearn, r, even excel can do this)? Even if done correctly, it is not very compelling when based off of 4 data points and accounting for my first concern.

zemo · 2 years ago

check out the paper, not just the summary. They explain their methodology. The output has four data points because it’s a summary. The input is … more data than that.

lolinder · 2 years ago

More data, but OP is right on the weaknesses of the study—the author posted here [0] and acknowledged that they can't actually say anything about causality, just that 2023 looked different than 2020.

[0] https://news.ycombinator.com/item?id=39168841

I did, that's where my quote is from. The appendix confirms; they ran the regressions on just two inputs: 2022 and 2023 totals.

panaetius · 2 years ago

Not even that, the prompt used is "Looking only at the years 2022 and 2023, what would a quadratic regression predict for 2024" as mentioned in the appendix.

So quadratic regression makes it sound all fancy, but with two data points, it's literally just "extend the line straight". So the 2024 prediction is essentially meaningless.

I’m sympathetic to the study results since I have seen similar things anecdotally but I agree their data is not really warranting the conclusions they reach. For all we know it could because of the covid hiring spree and subsequent layoffs.

wbharding · 2 years ago

Original research author here. It's exciting to find so many thinking about long-term code quality! The 2023 increase in churned & duplicated (aka copy/pasted) code, alongside the reduction in moved code, was certainly beyond what we expected to find.

We hope it leads dev teams, and AI Assistant builders, to adopt measurement & incentives that promote reused code over newly added code. Especially for those poor teams whose managers think LoC should be a component of performance evaluations (around 1 in 3, according to GH research), the current generation of code assistants make it dangerously easy to hit tab, commit, and seed future tech debt. As Adam Tornhill eloquently put it on Twitter, "the main challenge with AI assisted programming is that it becomes so easy to generate a lot of code that shouldn't have been written in the first place."

That said, our research significance is currently limited in that it does not directly measure what code was AI-authored -- it only charts the correlation between code quality over the last 4 years and the proliferation of AI Assistants. We hope GitHub (or other AI Assistant companies) will consider partnering with us on follow-up research to directly measure code quality differences in code that is "completely AI suggested," "AI suggested with human change," and "written from scratch." We would also like the next iteration of our research to directly measure how bug frequency is changing with AI usage. If anyone has other ideas for what they'd like to see measured, we welcome suggestions! We endeavor to publish a new research paper every ~2 months.

oooyay · 2 years ago

> We hope it leads dev teams, and AI Assistant builders, to adopt measurement & incentives that promote reused code over newly added code.

imo, this is just replacing one silly measure with another. Code reuse can be powerful within a code base but I've witnessed it cause chaos when it spans code bases. That's to say, it can be both useful and inappropriate/chaotic and the result largely depends on judgement.

I'd rather us start grading developers based on the outcomes of software. For instance, their organizational impact compared to their resource footprint or errors generated by a service that are not derivative of a dependent service/infra. A programmer is responsible for much more than just they code they right; the modern programmer is a purposefully bastardized amalgamation of:

- Quality Engineer / Tester

- Technical Product Manager

- Project Manager

- Programmer

- Performance Engineer

- Infrastructure Engineer

Edit: Not to say anything of your research; I'm glad there are people who care so deeply about code quality. I just think we should be thinking about how to grade a bit differently.

> this is just replacing one silly measure with another

> Not to say anything of your research

The second statement isn't true just because you want it to be true. The first statement renders it untrue.

> I'd rather us start grading developers based on the outcomes of software. For instance, ... errors generated by a service

yeah you should click through and read the whitepaper and not just the summary. The authors talk about similar ideas. For example, from the paper:

> The more Churn becomes commonplace, the greater the risk of mistakes being deployed to production. If the current pattern continues into 2024, more than 7% of all code changes will be reverted within two weeks, double the rate of 2021. Based on this data, we expect to see an increase in Google DORA's "Change Failure Rate" when the “2024 State of Devops” report is released later in the year, contingent on that research using data from AI-assisted developers in 2023.

The authors are describing one measurable signal while openly expressing interest in the topics you're mentioning. The thing is: what's in this paper is a leading indicator, while what you're talking about is a lagging indicator. There's not really a clear hypothesis as to why, for example, increased code churn would reduce the number of production incidents, the mean time to resolution of dealing with incidents, etc.

> That said, our research significance is currently limited in that it does not directly measure what code was AI-authored -- it only charts the correlation between code quality over the last 4 years and the proliferation of AI Assistants

So, would a more accurate title for this be "New research shows code quality has declined over the last four years"? Did you do anything to control for other possible explanations, like the changing tech economy?

nephrenka · 2 years ago

> our research significance is currently limited in that it does not directly measure what code was AI-authored

There is actual AI benchmarking data in the Refactoring vs Refuctoring paper: https://codescene.com/hubfs/whitepapers/Refactoring-vs-Refuc...

That paper benchmarked the performance of the most popular LLMs on refactoring tasks on real-world code. The study found that the AI only delivered functionally correct refactorings in 37% of the cases.

AI-assisted coding is genuinely useful, but we (of course) need to keep skilled humans in the loop and set realistic expectations beyond any marketing hype.

mrweasel · 2 years ago

People have different workflows, but mine is frequently, skim the documentation, make a prototype, refine code a bit, add tests, move stuff around, break stuff, rework code, study documentation, refactor a bit more, and then at that point I have enough understanding of the problem to go in at yank out 80% of my code and do it right.

If Copilot gives me working code in the prototype stage, good enough that I can just move on to the next thing, my understanding is never going to be good enough that I can go in and structure everything correctly. It will effectively allow me to skip 90% of my workflow, but pay the price. That's not to say that Copilot can't be extremely helpful during the final steps of development.

If those findings are correct, I can't say that I'm surprised. Bad code is written by poor understanding and Copilot can't have any understanding beyond what you provide it. It may write better code than the average programmer, but the result is no better than the input given. People are extremely focused on "prompt engineering", so why act surprised when a poor "prompt" in VScode yields a poor result?

andybak · 2 years ago

I'm not sure why you decided that "use copilot" also implies missing out most of your later steps. Who decides to skip all those steps? Presumably you?

My experience is that Copilot is great at getting me started. Sometimes the code is good, sometimes it's mediocre or completely broken.

But it's invaluable at getting me thinking. I wasted a lot more time before I started using it. That might just be my weird brain wiring...

(Edited to sound less narky. I shouldn't post from a mobile device)

fwsgonzo · 2 years ago

I recently tried Copilot out of curiosity and this is my experience too: It helps me getting started, which for me is 99% of the challenge. I know how to solve problems, even complex ones, but for some reason getting started is just so extremely hard, sometimes.

Copilot lets me get started, even if it's wrong sometimes. There have been times where I have been surprised by how it took something I wrote for a server, and presented the correct client-side implementation.

I've used it a few times to describe a problem and let it handle the solution. It's not very good, but I wonder if one should place more blame on PEBCAK and put more time into problem-description. I gave it a few more paragraphs to describe the problem, and eventually I could take it from there. It was still wrong, but enough to get me started. Immensely helpful that way.

Another aspect that I'm wondering about is if it will be able to do more with better documented code. Anyone have experience with that? I've started to write more doxygen comments, and hoping to see if there's a slow shift to more accurate predictions.

I’ve circumvented all of this “getting started” trouble with the pomodoro method. It’s simple and I don’t have to real with maybe broken code and it works for everything in my life. Worth a try.

spaniard89277 · 2 years ago

godzillabrennus · 2 years ago

I decided to use ChatGPT to build a clone of Yourls using Django/Python. I gave it specific instructions to not only allow for a custom shortened URL but to track the traffic. It didn’t properly contemplate how to do that in the logic or data model. I had to feed it specific instructions afterwards to get it fixed.

AI tools are akin to having a junior developer working for you. Except they are much much faster.

If you don’t know what you’re doing they just accelerate the pace that you make mistakes.

konschubert · 2 years ago

> If you don’t know what you’re doing they just accelerate the pace that you make mistakes.

100%

And if you know what you are doing, they will accelerate the way you're building stuff.

johnfn · 2 years ago

I pressed down the pedal on my car and it drove off a cliff!

geraneum · 2 years ago

It’s not always clear to everyone that there’s something they don’t know!

KronisLV · 2 years ago

> AI tools are akin to having a junior developer working for you. Except they are much much faster.

Honestly, this is brilliant. The other day I had to add table name prefixes to a SELECT statement column aliases, since such a feature just doesn't exist for some reason, a bit like:

So I just gave ChatGPT the schema definitions/query and it wrote out the long list of like 40 columns to be selected for me, like:

  SELECT
    table_a.id AS 'table_a_id',
    table_a.email AS 'table_a_email',
    ...
    table_b.id AS 'table_b_id',
    table_b.start_date AS 'table_b_start_date',
    ...

and so on. I haven't found another good way to automate things like that across different RDBMSes (different queries for system tables that have schema information) and while it's possible with regex or a bit of other types of text manipulation, just describing the problem and getting the output I needed was delightfully simple.

Aside from that, I just use the LLMs as autocomplete, which also encourages me to have good function naming, since often enough that's sufficient information for the LLM to get started with giving me a reasonable starting point. In particular, when it comes to APIs or languages I haven't used a lot, but the problems that I face have been solved by others thousands of times before. I don't even have to use StackOverflow much anymore.

That's why I bought Copilot (though JS/HTML autocomplete in JetBrains IDEs is visually buggy for some reason) and use ChatGPT quite a lot.

LLMs are definitely one of my favorite things, after IntelliSense (and other decent autocomplete), codegen (creating OpenAPI specs from your controllers, or bootstrapping your EF/JPA code from a live dev database schema), as well as model driven development (generating your DB schema migrations/tables from an ER model) and containers (easily packaged, self-contained environments/apps) and smart IDEs (JetBrains ones).

addaon · 2 years ago

> it wrote out the long list of like 40 columns to be selected for me

It seems like the process of reviewing its generated code to make sure all 40 columns are there and then either re-doing this or manually going through that list whenever the schema changes would take longer than just writing the script? And now you're asking your code reviewers to the same both boring-and-slow manual check on the commit rather than just reviewing the three lines of the script?

Deleted Comment

cleandreams · 2 years ago

My question is, how do you become a senior developer when the junior developer just keeps throwing "working" "good enough" code at you?

I think companies will want more code faster to the extent that fewer people will emerge from the churn really knowing what they are doing.