Throw away your first draft of your code

As an ML-focused python dev I have never been able to break the habit of REPL-driven development, but I find it works really well for "building code that works" rather than coming up with a tower of abstractions immediately. A typical python development workflow for me is:

* Start with a blank `main` file and proceed linearly down the page, executing as I go.

* Gradually pull out visually awkward chunks of code and put them into functions with no arguments at the top of the file.

* If I need to parameterize them, add those parameters as needed - don't guess at what I might want to change later.

* Embrace duplication - don't unnecessary add loops or abstractions.

* Once the file is ~500 LOC or becomes too dense, start to refactor a bit. Perhaps introduce some loops or some global variables.

* At all times, ensure the script is idempotent - just highlighting the entire page and spamming run should "do what I want" without causing trouble.

* Once the script is started to take shape, it can be time to bring some OO into it - perhaps there is an object or set of objects I want to pass around, I can make a class for that. Perhaps I can start to think about how to make the functionality more "generalized" and accessible to others via a package.

This is literally the only way I've ever found to be productive with green field development. If my first LOC has the word "class" or "def" in it - I am absolutely going to ripping my hair out 12 hours later, guaranteed.

mbwgh · 2 years ago

> * Gradually pull out visually awkward chunks of code and put them into functions with no arguments at the top of the file.

I have seen scientific code written in this manner written in both Python and Fortran. This may be some intuitive way to start off, and even complete the task at hand.

But for people trying to read, understand, and realistically, debug your code, this complicates things.

Because each no-argument function can only work via its side effects. Your script becomes a succession of state transitions, and to understand it, you have to keep the intended state after each step in your head.

And in case of a mistake, you can't even call these functions individually via passing their intended arguments using the REPL. You have to set up all the state beforehand manually, call your no-arg function, observe the state afterwards. Which becomes more awkward the further down you are in your script, since all your dependencies are implicit and possibly even completely undocumented.

raincole · 2 years ago

The correct way is:

1. Use comments to split visual awkward part into chunks.

# a lot of code...

## ===

# more a lot of code...

2. Use inner functions if that chunk need to be reused

3. Only move the chunk to top-level function if you think it's worth to take time to make its required state into parameters/return value

extr · 2 years ago

Sure that's like the worst case scenario, but in practice the entire point is that as soon as it becomes difficult to reason about, that's when you start cleaning up the functions and add some obvious OO. You don't leave it as a mess.

bogeholm · 2 years ago

Agree 99% except this statement:

> Embrace duplication - don't unnecessary add loops or abstractions

I’ll usually make a function or perhaps tiny class as soon as I start reusing bits of code.

Apart from that, agree as stated. At my previous job (Python shop), a lot of the data engineers came from a Java background, and had a tendency to think top-down. Many things were over engineered ‘just because we might need it’:

- Factory classes used only once or twice in entire code base

- Lets make an AbstractReaderInterface because we might want to abstract the file type or location later (while 100% of files are Parquet on S3)

I’ve really enjoyed using dataclasses and Pydantics BaseModels prolifically, and adding type hints (coupled with type checks in CI).

Model the data, write a well structured imperative workflow, set up CI, write unit tests, enforce typing. Add OOP if needed, then close ticket.

robertlagrant · 2 years ago

Agreed. The general principle is (and it's a hard balance, of course, to avoid just slapdash work): don't do work up front you might not need. You will make your current work take longer than it should, which you will (correctly) be blamed for, and any time you save in the future you won't get the credit for.

This isn't a cynical statement; it's just my experience. Use it to your advantage!

IceSentry · 2 years ago

I don't understand why you consider loops an abstraction. They are some of the most basic building block.

extr · 2 years ago

People are getting caught up on the loops thing. All I meant was in my line of work I often end up with many special cases of general processes. Writing a loop prematurely always bites me - I end up writing control flow for handling the one-offs, it somehow always becomes more obtuse than just listing things out literally.

freehorse · 2 years ago

Loop as an abstraction of copy-paste.

quickthrower2 · 2 years ago

They considered it a dedupe not an abstraction

It would be unnecessary to have a loop over keys of a dictionary to call function xyz when you can just repeat the xyz calls (it would look nicer too)

Unless the dictionary is huge and dynamically loaded.

dragonwriter · 2 years ago

Loops are an abstraction over conditional branching and, depending on the kind of loop, some other things.

freehorse · 2 years ago

> Embrace duplication - don't unnecessary add loops or abstractions.

> Once the file is ~500 LOC or becomes too dense, start to refactor a bit. Perhaps introduce some loops or some global variables.

I agree with most, but if I "embrace duplication" I can reach 500+ LOC in half an afternoon :P. It seems to have really paid off for me to start some degree of abstraction (not OO yet in general) early enough. Tbf, it is easier for me to tidy up the code with some abstractions, rather than ensure that the "main"-titled script runs from beginning to end each point of time with no errors, which imo can hinder experimentation more than abstraction. But also depends on what I do, I guess, the greener the field the more I feel this way.

patrick451 · 2 years ago

Wow, not a ML person, but controls and robotics. Yet, this describes my workflow for a lot of things almost to a tee. Even down to the avoiding loops. I tend to do that when I want to run the same simulation or analysis for a couple variables or datasets. It's interesting, because in the past I was a lot more prone to turn that into a loop early on. But this makes your code brittle. You'll want to do something slightly different for the two datasets, which means a bunch of conditionals in your loop. It's actually really similar to the problems you get with boolean flags when you try to abstract into a function too soon. It actually takes disciple to for me to commit to copy past but I think it pays it off.

teamspirit · 2 years ago

I too agree with pretty much everything you say. Just want to add that I pretty much solely use ptpython[0]. It can handle line breaks in pasted code, vim (or emacs) bindings and syntax highlighting, and much more.

[0] https://github.com/prompt-toolkit/ptpython

strangattractor · 2 years ago

Pretty much follow an identical process. When I do finally rewrite the code, after getting a working version, the duplication pretty much screams clean me up and simplify/generalize. I have never been able to just see the whole thing before I start. The process itself teaches you things.

rawoke083600 · 2 years ago

Interesting thought (and coding) process. I love "design, thinkering and basically active-thinking" with pseudo-code in a txt file, i.e design.txt.

Just noting functions, d-structure and some flow usually helps to arrive at something worthwhile...

slim · 2 years ago

by REPL here you mean jupyter notebook?

extr · 2 years ago

I usually use VS code and the "interactive" python functionality (not jupyter). I highlight code and execute just that code with a hotkey. Works just as well with any kind of vim-slime like functionality.

bogeholm · 2 years ago

REPL = Read-Eval-Print Loop. So could be iPython or just plain `python` in general, can’t say what OP is using of course

ploika · 2 years ago

I also primarily write ML-focused Python. For me, having originally learned R and C at the same time, nothing has ever surpassed RStudio as a dev environment. For the past several years my preferred setup has been tmux and Vim with vim-slime in one pane and IPython in the other.

(Personally, and speaking only for myself, I hate Jupyter notebooks with a burning passion. I think they are one of the worst things ever to have happened to software development, and definitely the worst thing ever to have happened to ML/data science.)

hughesjj · 2 years ago

Not OP but I often code in the REPL for python as well. Sometimes I'll stub out my code and just drop into an interactive debugger where I'm writing the next section.

In the python debugger, if you type `interact`, it'll give you the normal python repl. This combined with the `help` and `dirs` are super useful for learning new frameworks/libraries and coding with your actual data.

Dead Comment

The temptation to throw away all of your code - be it a prototype or a "grown" code base - arises often.

Often it is a bad idea. I get it, though, there is an inherent attractiveness in the idea of starting fresh from a clean slate. Except, more likely than not, you'll soon find yourself in a similar situation to the one you started from. The fundamental problem is that it is easy to underestimate the edge cases. Sure, the main functionality is easily understood and straight-forward to conceptualize. But did you remember to think through all the smaller aspects of your software, too?

Perhaps it's easier with a prototype implementation that in fact doesn't have a lot of features yet, but to completely replicate the functionality of a complex piece of software isn't an easy undertaking. Sure, getting 80% there is probably easy, but the last 20% is the part that's easy to overlook when considering a complete rewrite.

Admittedly, there's nothing sexy about refactoring. And often it may seem like it's less work to just simply scrap everything and start over. However, that's fallacy a lot of times.

nemetroid · 2 years ago

Throwing away an established code base is a completely different thing from throwing away a few-days-old prototype, and not what the article is suggesting. The points you mention don't apply to the prototype case.

ninepoints · 2 years ago

This is such a HN phenomenon it infuriates me. Read article. Then read comment that misconstrues article to be something completely different, accompanied with loud critique. Read replies all in violent agreement to obviously self-evident strawman.

tuwtuwtuwtuw · 2 years ago

Yeah, this thread is a somewhat strange read. People don't seem to understand what "draft" or "prototype" means. Odd.

josephg · 2 years ago

> Admittedly, there's nothing sexy about refactoring. And often it may seem like it's less work to just simply scrap everything and start over. However, that's fallacy a lot of times.

Agreed. Though sometimes you need to refactor the core of an application, in a way which will touch the entire app. To do that I often make a new, empty project. Then I rewrite the core of my project into the new folder (in whatever new structure I’m trying out). When that works, I slowly copy the content from the old project into the new project - refactoring and testing along the way.

But it’s not perfect. About half the time I do this, I discover halfway through that I didn’t understand some aspect of the system. My new design is wrong or useless and I throw away all the new code. Or I figure out that I can just make the changes in place in the old project folder after all, and I bail on the new code and go back to a traditional refactor.

But no matter what happens, I don’t think I’ve ever regretted a refactoring attempt like this. I always come away feeling like I’ve learned something. How much you’ve learned from a project is measured by the number of lines of code you threw away in the process.

majikandy · 2 years ago

I find refactoring sexy. Just saying.

Buttons840 · 2 years ago

> The temptation to throw away all of your code - be it a prototype or a "grown" code base - arises often.

I want an editor plugin which allows me to mark sections of code as reviewed or "perfect" (depending on how honest I'm being). Then, when I'm tempted to rewrite everything, I can go through and mark what I think is good, and then focus on refactoring the rest until I think it is good as well.

I'm tempted to rewrite code because I lose track of what it's doing, or I've learned a lot since I wrote that old code and so I'm not sure if the old code is good anymore. It's not so much about rewriting the code as an exercise in getting familiar with the code I've already written. I want a tool to help me with this.

gregmac · 2 years ago

I would think a combination of git, unit tests, and comments would solve this problem?

Unit tests prove the code works as intended, and are basically examples of what the code is doing. Whether the code is actually "good" is a bit more subjective -- but tests give you the freedom to modify it without breaking it.

Checking into git frequently is also a way to give yourself some freedom. Commit at every milestone, like every time the next thing is "working". If you feel like refactoring, go for it -- you can reset back to working state in a few seconds.

And lastly, leave comments in. You can always clean it up before you push. You can even squash or interactively rebase your history so no one else sees the gory details how the hot dog was actually made.

quickthrower2 · 2 years ago

If you could modularize code in such a way the sizable chunks could “just sit still!” then entire modules could be marked as perfect. But you wouldn’t have to since you wouldn’t naturally need to touch them (e.g. how many codebases are using a fork of core-utils, for example, … to exaggerate a conways law effect).

But unfortunately the modules don’t make themselves apparent at the start of the project. So it needs refactoring discipline.

eternityforest · 2 years ago

I don't really feel that attraction to complete rewriting. I wonder if the people who do are very smart, and thus able to hold more state in their head, so ugly code bothers them more even if it's not actively a problem, because they are able to have background tasks in their mind to worry about it?

And at the same time, perhaps their code is less encapsulated, because they didn't optimize for abstraction, they optimized for beauty. A leaky abstraction doesn't bother them, because ALL abstractions are leaky to them, they probably have a sense of internal workings even whem using household appliances, but ugly code tucked away somewhere bothers them a lot, and they might dislike using popular large libraries even if they work great, just because they're not comfortable using what they don't understand deeply.

My evidence of this is the fact that suckless exists and people actually use it, I assume their experience of thought is very different from anything I have experienced.

jwells89 · 2 years ago

Speaking personally my urge to rewrite at least partially comes from not truly understanding the problem and the solutions to it until I've written something reasonably functional. It doesn't matter how much time I put into sitting and theorizing, there's always things I didn't anticipate and assumptions that turned out to be incorrect.

This usually means that rewrites are significant improvements across the board, especially if they're done a relatively short time after the original is finished since it's all still fresh in my head.

This may be a weakness of sorts on my part though, I lack formal engineering training which might be why purely mental modeling (no code) doesn't work all that well for me.

coolliquidcode · 2 years ago

ALL abstractions are leaky - this is an objectively true statement.

As others said it's not for beauty, it to make sure if there is abstraction it fits the problem. If there is encapsulation it doesn't get in the way. For some coders they can get it on the first try and there is no reason for them to rewrite code. For the rest of us mid coders we need to explore first as well as make sure all cases we desire for Lib/API work.

ipaddr · 2 years ago

Smart people can ignore ugly code. It's the people who get easily confused that need to see clean and easy to understand code.

freehorse · 2 years ago

There is some point regarding changes on the code where rewriting is less costly or more costly than changing the current code. For me, a big part depends on whether I depend long term or short term on it and how deep I will have to go anyway using it.

lamontcg · 2 years ago

I'd rewrite code to make it simpler and easier to hold in one's head, not just to make it pleasing from some aesthetic viewpoint.

m463 · 2 years ago

It says to throw it away after a couple of DAYS, which seems to differ from other advice (like from joel-on-software)

Maybe this is ok, given that time period?

or will the rewrite have all these extra bells and whistles?

or will the rewrite throw away the unneeded bells and whistles?

HenryBemis · 2 years ago

Ok, perhaps don't throw it away. When I come up with an idea (for an app usually), I use "a few" A4-sheets (anything between 5 and 20), scribble and draw on them with a pencil, draw screens, buttons, data flows, activities, write notes (in various font sizes). Then I use my CamScanner and call this a v1.

Then email me the PDF and store the papers in a box, and a couple of days later I start the v2 (same process), then v3.

By v4 it's 'good'.

I also use the same method on my 9-5.

I take VERY seriously the Abraham Lincoln quote “Give me six hours to chop down a tree and I will spend the first four sharpening the axe” on almost everything.

I consider the v1, v2, v3 as the "sharpening the axe" and the v4 on the actual cutting.

In that spirit, the article has a similar approach, as the v1, v2, and v3 may take you down an (more than) imperfect path.

withinboredom · 2 years ago

> Admittedly, there's nothing sexy about refactoring.

There's the "strangler fig" (my favorite) method of refactoring. It's where you rewrite just a small portion of the software, little bits at a time. Instead of doing a full rewrite.

IMHO, it's the best way to refactor. You can switch out both "old" and "new" versions at-will until you're 100% sure you've covered all the edge cases.

barbariangrunge · 2 years ago

Game companies implicitly do this, throwing out old versions of their code (sort of). You make a game. Ship it. Start on a new game. Your previous game is now sort of a practice run for making a new game from scratch. Continue, ad infinitum.

It's weird: you rarely have to maintain something for 20+ years, and you get to always improve and iterate on how you did things last time. But, are you training yourself to write hard to maintain code, since you don't really have to maintain it past a certain period? Or does the learning-from-iteration actually make writing-maintainable-code easier?

I know some people do keep developing their games for decades, look at Dwarf Fortress, I'm just talking in general.

glouwbug · 2 years ago

Minecraft is reaching that 15+ year mark

throwaway14356 · 2 years ago

what you do you get good at. if you don't have to maintain there is no point in getting good at it or any reason to think you will gradually get better at it.

duxup · 2 years ago

Required link to Joel on Software:

https://www.joelonsoftware.com/2000/04/06/things-you-should-...

Granted Joel is taking about a wholesale rewrite of proven / established code. Code and has had had the advantage of time.

Personally, I end up rewriting parts of my code often. It usually takes a mile for me to find exactly the right way it should be based on how are people use it.

You absolutely should rewrite prototypes, and re-factor important chunks of code. I rewrote something 3 times today, it was better each time.

At the same time you should be wary…

ideamotor · 2 years ago

Of the reasons he says engineers want to rewrite the code, he says: badly organized code, slow processing time, and ugly code.

Maybe I’m not an engineer, because I bet perfectly pretty fast code is an indicator you should reorganize it to make it easier to adapt, add more capabilities even if it runs slow, and put some ugly fixes in there so it’s useful for the end user.

Agreed that wanting to over-engineer and perfect something that should be constantly evolving is not a good reason to rewrite it. That sounds like back office bubble pretend-work.

trentnix · 2 years ago

Ah yes, starting from a clean slate.

https://devrant.com/rants/816880/i-ve-done-it-again

robaato · 2 years ago

Rather than "throw away" I would "start from scratch" meaning I can refer to prototype but reimplement with respect to appropriate norms. As a version control fanatic, very little gets really thrown away in my book. Just hidden.

tnecniv · 2 years ago

Yeah normally when I do this, it’s because I picked bad abstractions. The main business logic I can bring over almost line by line, but I want to reshape the abstractions / data structures / interface to that logic

wpietri · 2 years ago

Nah. I think a prototype is something you definitionally should throw away. There's a huge freedom to knowing that instead of taking on tech debt, you're committed to declaring tech bankruptcy and throwing away the code. It lets you try things, cut corners, and generally experiment. It's great for thinking through what everybody really wants.

I do agree that it's generally a bad idea to just throw out a long-lived code base. But for me that's not an estimation issue. It's because the urge to throw it out is usually a response to upstream problems that haven't gotten fixed. For example, a lot of code bases are a mess due to time pressure. But if you don't fix the process problems that turn time pressure into mess, then you're just doing to end up with another mess. Possibly a bigger one, in that stopping all productive output in favor of a rewrite usually makes business stakeholders crazy, causing increased time pressure.

hot_gril · 2 years ago

I've thrown away and redone a lot of code, both mine and others', and I've never regretted the time spent vs continuing to use the old thing.

Tozen · 2 years ago

A possible alternative is to use a different programming language for doing the prototyping than the one which will be used in production. The urge to hang on to prototype code can be removed, because you are going to do a rewrite in a different language, regardless.

By using 2 different languages, there is more freedom to just make a demo, discuss, and then decide what to keep or what direction to go.

Deleted Comment

tuwtuwtuwtuw · 2 years ago

RTFA