You've just inherited a legacy C++ codebase, now what?

Some good advice here, and some more...controversial advice here.

After inheriting quite a few giant C++ projects over the years, there are a few obvious big wins to start with:

* Reproducible builds. The sanity you save will be your own. Pro-tip: wrap your build environment with docker (or your favorite packager) so that your tooling and dependencies become both explicit and reproducable. The sanity you save will be your own.

* Get the code to build clean with -Wall. This is for a couple of reasons. a) You'll turn up some amount of bad code/undefined behavior/bugs this way. Fix them and make the warning go away. It's ok to #pragma away some warnings once you've determined you understand what's happening and it's "ok" in your situation. But that should be rare. b) Once the build is clean, you'll get obvious warnings when YOU do something sketchy and you can fix that shit immediately. Again, the sanity you save will be your own.

* Do some early testing with something like valgrind and investigate any read/write errors it turns up. This is an easy win from a bugfix/stability point of view.

* At least initially, keep refactorings localized. If you work on a section and learn what it's doing, it's fine to clean it up and make it better, but rearchitecting the world before you have a good grasp on what's going on globally is just asking for pain and agony.

hobofan · 2 years ago

> wrap your build environment with docker (or your favorite packager) so that your tooling and dependencies become both explicit and reproducable

If you want explicitness and reproducibility please don't reach for Docker. Unless you take a lot of care, you will only get the most watered down version of reproducibility with Docker probably luring you into a false sense of security. E.g. pointing to mutable image tags without integrity hashes and invoking apt-get are things you'll find in most Dockerfiles out there and both leave open a huge surface area for things to go wrong and end up in slightly different states.

And while they are not that easy to pick up, solutions like Bazel and Nix will give you a lot better foundation to stand on.

basicallybones · 2 years ago

A nice middle ground is using a tool like Google's Skaffold, which provides "Bazel-like" capabilities for composing Docker images and tagging them based on a number of strategies, including file manifests. In my case, I also use build args to explicitly set versions of external dependencies. I also pull external images and build base images with upgraded versions once, then re-tag them in my private repository, which is an easy-to-implement mechanism for reproducibility.

While I am in a Typescript environment with this setup at the moment, my personal experience that Skaffold with Docker has a lighter implementation and maintenance overhead than Bazel. (You also get the added benefit of easy deployment and automatic rebuilds.)

I quite liked using Bazel in a small Golang monorepo, but I ran into pain when trying to do things like include third-party pre-compiled binaries in the Docker builds, because of the unusual build rules convention. The advantage of Skaffold is it provides a thin build/tag/deploy/verify layer over Docker and other container types. Might be worth a look!

Kudos to the Google team building it! https://skaffold.dev

SJC_Hacker · 2 years ago

> If you want explicitness and reproducibility please don't reach for Docker. Unless you take a lot of care, you will only get the most watered down version of reproducibility with Docker probably luring you into a false sense of security. E.g. pointing to mutable image tags without integrity hashes and invoking apt-get are things you'll find in most Dockerfiles out there and both leave open a huge surface area for things to go wrong and end up in slightly different states.

If this is frequently a problem you're doing something wrong, or using such a crappy external library/toolchain that breaks frequently on the same version.

Docker is a way to ensure that the software builds with "the most recent minor version" of some OS/toolchain/libraries.

The reason why you want the most recent version is because of security fixes and bugs.

I agree that you should check integrtiy hashes where appropriate, if you really want to fix versions.

Tainnor · 2 years ago

Not everything is all or nothing. 80-90% reproducible builds are often good enough and learning Nix only to get that last 10-20% is not always worth it. And there are ways of pinning all dependencies with docker if you really want to.

I've had issues with Dockerfiles not building anymore due to changes in the package registry, but it was like 1-2 times out of 1000 of times I used docker.

bluGill · 2 years ago

I have docker as part of my reproducibility builds. We recently ran into a problem trying to rebuild some old code - turns out the ssl certificates in our older images has expired and so now the code isn't reproducible anyway. One more reason to agree with the idea that docker shouldn't be used.

Though we use docker for a different reason: we are building on linux, targeting linux. It is really easy to use the host system headers or library instead of the target versions of the same - and 99% of the time you will get away with it as they are the same. Then several years later you upgrade your host linux machine and can't figure out why a build that used to work isn't (and since 99% of the stuff is the same this can be months before you test that one exception and then it crashes). Docker ensures we don't install any host headers, or libraries except what is needed for the compiler.

hawk_ · 2 years ago

> If you want explicitness and reproducibility please don't reach for Docker

Is it common in C++ builds to rely on the current O/S libraries instead of say making most dependencies explicit, close to full cross-compile? Do dependencies need to be pulled in using apt-get and not something like maven?

dataflow · 2 years ago

Step 0: reproducible builds (like you said)

Step 1: run all tests, mark all the flaky ones.

Step 2: run all tests under sanitizers, mark all the ones that fail.

Step 3: fix all the sanitizer failures.

Step 4: (the other stuff you wrote)

hyperman1 · 2 years ago

If we're going to visit the circles of hell, let's do it properly:

Step -1: Get it under source control and backed up.

Step -2: Find out if the source code corresponds to the executable. Which of the 7 variants of the source code (if any).

Step -3: Do dark rituals over a weekend with cdparanioa to scrape the source code from the bunch of scratched cd's found in someone's bottom drawer. Bonus point if said person died last week, and other eldritch horrors lurk in that bottom drawer. Build a VM clone of the one machine still capable of compiling it.

Yes, I have scars, why do you ask?

groby_b · 2 years ago

Step 0 sounds so easy. Until you realize __time__ exists. Then you take that away, and you find out that some compiler heuristics might not be deterministic.

Then you discover -frandom-seed - and go ballistic when you read "but it should be a different seed for each sourcefile"

Then you figure out your linker likes emitting a timestamp in object files. Then you discover /Brepro (if you're lucky enough to use lld-link.

Then you used to discover that Win7's app compat db expected a "real" timestamp, and a hash just won't do. (Thank God, that's dead now). This is usually the part where you start questioning your life choices.

Then somebody comes to your desk and asks if you can also make partial rebuilds deterministic.

On the upside, step 1 is usually quick, there will be no tests.

eschneider · 2 years ago

Just a note on legacy tests: Step 0.5: understand the tests. They need to be examined to see if they've rotted or not. Tests passing/failing doesn't really mean code under test works or not. The tests might have been abandoned under previous management and don't accurately reflect how the code is _supposed_ to be working.

daemin · 2 years ago

Probably insert another Step 1: implement tests Be they simple acceptance tests, integration tests, or even unit tests for some things.

dheera · 2 years ago

> wrap your build environment with docker

For microservices it is fine, but you can't always deploy everything else with docker, especially for people who want to use your app inside a docker. Docker-in-docker is a situation that should never happen.

Containers are nice but they're a horrible way to pretend the problem doesn't exist.

Bundle all the dependencies and make sure it doesn't depend on 5 billion things being in /usr/lib and having the correct versions.

cshokie · 2 years ago

Not the OP, but I don’t think they meant that the build output is in a container. They meant that the thing you use to compile the code is in docker (and you just copy out the result). That would help ensure consistency of builds without having any effect on downstream users.

microtherion · 2 years ago

> Get the code to build clean with -Wall.

This is fine, but I would strongly recommend against putting something like -Wall -Werror into production builds. Some of the warnings produced by compilers are opinion based, and new compiler versions may add new warnings, and suddenly the previous "clean" code is no longer accepted. If you must use -Werror, use it in debug builds.

ladberg · 2 years ago

This is fixed by the suggestion right before it:

> * Reproducible builds. The sanity you save will be your own. Pro-tip: wrap your build environment with docker (or your favorite packager) so that your tooling and dependencies become both explicit and reproducable. The sanity you save will be your own.

Upgrading compiler versions shouldn't be done out-of-band with your normal PR process.

__d · 2 years ago

That's a feature!

New warnings added to new compiler versions can identify problems that weren't previously detected. You _want_ those to -Werror when they happen, so you can fix them if they need it.

Changing a compiler version is a task that has to be resourced appropriately. Part of that is dealing with any fallout like this. Randomly updating your compiler is just asking for trouble.

planede · 2 years ago

IMO there is absolutely no reason to enable warnings in CI without -Werror. Nobody reads the logs of a successful build.

If some warnings are flaky then disable them specifically. In my experience most warnings in -Wall are OK and you can suppress the rare false positives in code. Don't suppress without a comment.

edit:

Having said that there are entirely valid reasons to not have -Werror outside of CI. It should be absolutely disabled by default if you distribute source.

bluGill · 2 years ago

-Wall and -Werror should be running on all developer machines and your CI machines.

If you are delivering source code to someone else (or getting source code from someone else that you build but do not otherwise work on then you should ensure warnings are disabled for those builds. However all developers and your CI system still needs to run with all warnings possible.

The days when compilers put in warnings that were of questionable value are mostly gone. Today if you see a warning from your compiler it is almost always right.

nanolith · 2 years ago

The compiler and toolchain is a dependency like any other. Upgrading to a new compiler version is an engineering task just like upgrading a library version. It must be managed as such. If this leads to new errors, then this becomes part of the upgrade management. Likewise, since the code generator and optimizers have changed, this upgrade must be tested like any other new feature or fix. Create an appropriate topic/feature branch, and dig in.

golergka · 2 years ago

Great advice. Almost all of it applies to any programming language.

vlovich123 · 2 years ago

Step 0: CI on every commit to build & run all tests / enforce that you never regress once you've finished a step.

SleepyMyroslav · 2 years ago

I would not call that 'controversial'. In the internet days people call this behavior trolling for a reason. The punchline about rewriting code in different language gives an easy hint at where this all going.

PS. I have been in the shoes of inheriting old projects before. And I hope i left them in better state than they were before.

It's funny. My first step would be

  0. You reach out to the previous maintainers, visit them, buy them tea/beer and chat (eventually) about the codebase. Learned Wizards will teach you much.

But I didn't see that anywhere. I think the rest of the suggestions (like get it running across platform, get tests passing) are useful stress tests likely to lead you to robustness and understanding however.

But I'd def be going for that sweeet, sweet low-hangin' fruit of just talking to the ol' folks who came that way before. Haha :)

GuB-42 · 2 years ago

I wouldn't make it the first step. If you do, you will probably waste their time more than anything.

Try to work on it a little bit first, and once you get stuck in various places, now you can talk to the previous maintainers, it will be much more productive. They will also appreciate the effort.

guhidalg · 2 years ago

There’s a fine balance with no right or wrong answer. Previous maintainers will appreciate if you spent literally more than a second trying to understand before you reach out to them, but for your own sanity you should know when it’s time to stop and call for help.

keepamovin · 2 years ago

Yeah, that's fair enough. I guess since we're already zero-indexed maybe my -1 step is Prep. Hahaha! :)

keepamovin · 2 years ago

While this sounds good, it sort of assumes you're trying to solve some puzzle, and you're going to them for hints and answer. That's not the problem at hand tho: we're not asking them to solve some puzzle for us, we're just seeking understanding.

Because what we're actually doing is building capability to be able to work on a wide range of problems on our own. And getting guided by people who already know there way around the codebase and know what everything does is going to get you there quicker than any other method (unless docs are truly excellent, which in most cases they're not).

So, after reflecting on it more, I disagree. I think assuming you can poke around for yourself, and start to build up a picture of what things do, may end up giving you confident delusions, that could be then hard to unlearn, or at least blockers to further understanding. Better to go to the old guard with a blank slate mindset, and let them fill you in on the overview.

Then go away and work on it, having that guidance in store, and then (if you were friendly enough in the first meet) come back for additional. That is what I think people would truly appreciate.

Think about it: assume you were someone who generally had time to help people coming to you for assistance, would you not love to sit someone down and give them an in depth download of the big picture, and details overview, of this thing into which you poured great love and devotion (or at least, many hours, sweat and dollars?). I think you would.

Anyway, it's a reminder to me that what often sounds right in the moment, and is easy to agree with, actually turns out to not be the best idea. I'm always eager to see another side, and empathize with someone else's view, but the lesson for me is: I often take that too far, and in doing so, forget what I originally came in with, forget my own frame, and forget what's true, so eager am I to see it from someone else's point of view. So that's a big lesson for me. Thanks for being part of it! Haha! :)

theamk · 2 years ago

Maybe do a quick look at codebase first so you can identify biggest WTF's and ask about them.

After all, if you have inherited a codebase with no tests, with build process which fails every other time, with unknown dependency info, and which can only be built on a single server with severely outdated OS... are you sure the previous maintainer is a real wizard and all the problems are result of not enough time? Or are they a "wizard" who keep things broken for job security and/or because they don't want to learn new things?

vaylian · 2 years ago

Even if that person is not a great archwizard, they still have more experience in that project than you and you will probably learn some things that will make your life less miserable, because you will better understand what to expect and what kind of failed solutions have been tried before.

keepamovin · 2 years ago

Yes! Good idea. Locking in at -1. Hahah! :)

lelanthran · 2 years ago

> It's funny. My first step would be

> 0. You reach out to the previous maintainers, visit them, buy them tea/beer and chat (eventually) about the codebase. Learned Wizards will teach you much.

Have you ever tried that? This is legacy code. Even if the handover was yesterday, they cannot tell you about anything useful they did more than 6m in the past.

And that's the best-case scenario. The common-case is "That's the way I got it when it was handed over to me" answer to every question you ask.

mst · 2 years ago

I have, and assuming your predecessor doesn't mind, you still get enough useful answers to make it worth it a lot of the time.

Especially if you manage to get to just chatting about the project - at some point they'll almost certainly go "oh! while I remember," followed by the answer to a question you didn't realise you should've asked.

The value is often in the commiseration rather than the interrogation, basically.

TrackerFF · 2 years ago

I was once tasked with deploying a piece of software on a closed network (military), to run on a old custom OS - it wasn't a huge program, around 50k lines of code.

I did encounter a bunch of bugs and problems underway, and wanted to reach out to the devs that wrote it - as it was customer made for my employer.

Welp, turns out it was written by one guy/contractor, and that he had passed away a couple of years earlier.

At least in the defense industry you'll find these sort of things all the time. Lots of custom/one-off stuff, made for very specific systems. Especially on the hardware side it is not uncommon that the engineers that made the equipment are either long gone or retired.

keepamovin · 2 years ago

Welp, you might have needed to consult a seance for that one.

Interesting point about the "author expiry" window. I was thinking about that today regarding something else:

Let's say in 20 years time, no database code has been updated for the last 20 years. And all the people who worked on it, can't remember anything about it. Yet, it still works.

That means that everyone who uses that database everyday, doesn't know how it works. They believe that it works, this belief is widespread. And the database providing results to queries, is a real thing. And it does work -- but nobody knows how.

This is common. I don't know in any detail how the MacBook I use works. But it does. I don't know how many things I use actually work. But they do work.

It seems the only difference, in the world of things that "work", and which most people who use them do not understand how they work, is that there are two classes of things: those things for which there is a widespread belief that they do work; and those things for which the belief that they work, is not widespread. But in either case, they work.

raverbashing · 2 years ago

Easier said than done

My step 0 would be: run it through an UML tool to get a class diagram and other diagrams.

This will help you a lot.

> Get the tests passing on your machine

Tests? On a C++ codebase? I like your optimism, rs

Joel_Mckay · 2 years ago

No one has time to maintain the UML in production.

You may luck out with auto-generated documentation, but few use these tools properly (javadoc or doxygen). =)

fransje26 · 2 years ago

    > You reach out to the previous maintainers, visit them

I could have brought them flowers, and shared a moment of silence contemplating eternity. I don't know if it would significantly have helped understanding the code base though..

Night_Thastus · 2 years ago

IME, this only works if you can get regular help from them. A one-off won't help much at all.

mellutussa · 2 years ago

> A one-off won't help much at all.

Monumentally disagree. One-off session with a guy who knows the codebase inside out can save you days of research work. Plus telling you all about the problematic/historical areas.

Joel_Mckay · 2 years ago

I've always found discussing why former employees left a project incredibly enlightening. They will usually explain the reality behind the PR given they are no longer involved in the politics. Most importantly they will often tell you your best case future with a firm.

Normally, employment agreements specifically restrict contact with former staff, or discussions of sensitive matters like compensation packages.

C++ is like any other language, in that it will often take 3 times longer to understand something than re-implement the same. If you are lucky, than everything is a minimal API lib, and you get repetitive examples of the use cases... but the cooperative OSS breadcrumb model almost never happens in commercial shops...

Legacy code bases can be hell to work with, as you end up with the responsibility for 14 years of IT kludges. Also, the opinions from entrenched lamers on what productivity means will be painful at first.

Usually, with C++ it can become its own project specific language variant (STL or Boost may help wrangle the chaos).

You have my sympathy, but no checklist can help with naive design inertia. Have a wonderful day. =)

keepamovin · 2 years ago

Yeah you need to cultivate those relationships. But with a willing partner that first session will take you from 0 to 1 :)

planede · 2 years ago

You mean the guys the company laid off last week?

keepamovin · 2 years ago

Ouch, yeah. I appreciate your topicality with this comment given the recent saga of tech layoffs. Much luck to you if you're in that experience right now!! Say you got fired form a S.Eng job at 350K yearly and some young grunt on 165K slid into your DMs and wanted to meet to chat. What would make you agree to that? I think it's unlikely, but perhaps if they reflected to you all the things you already thought were stupid about that company, and that you suffered with. You would want to help them as in their suffering, you could see your own, despite getting laid off. I think in that situation, there's a chance you might actually go over the codebase with them, in a conversation that mixed 1 part that, with 1 part self-commiserating and complaining/airing/venting about the org.

What at first seems an unlikely pairing, I think, given the right "relational config" in moment could actually end up working pretty well. Seems there's plenty opportunities for such a coupling to have many fruitful outputs for both parties.

* You can find out the most relevant files/functions for your upcoming works. If some functions/files have been frequently changed, then it's going to be the hot spot for your works. Focus on them to improve your quality of life. If you want to introduce unit tests? Then focus on the hot spot. Suffer from lots of merge conflicts? The same. * You can also figure out correlation among the project and its source files. Some seemingly distant files are frequently changed together? Those might suggest an implicit structure that not might be clear from the code itself. This kind of information from external contexts can be useful to understand the bird's eye view. * Real ownership models of each module can be inferred from the history. Having a clear ownership model helps, especially if you want to introduce some form of code review. If some code/data/module seems to have unclear ownership? That might be a signal for refactoring needs. * Specific to C/C++ contexts, build time improvements could be focused on important modules, in a data driven way. Incremental build time matters a lot. Break down frequently changed modules rather than blindly removing dependencies on random files. You can even combine this with header dependency to score the module with the real build time impact.

Jtsummers · 2 years ago

I'd swap 2 and 3. Getting CI, linting, auto-formatting, etc. going is a higher priority than tearing things out. Why? Because you don't know what to tear out yet or even the consequence of tearing them out. Linting (and other static analysis tools) also give you a lot of insight into where the program needs work.

Things that get flagged by a static analysis tool (today) will often be areas where you can tear out entire functions and maybe even classes and files because they'll be a re-creation of STL concepts. Like homegrown iterator libraries (with subtle problems) that can be replaced with the STL algorithms library, or homegrown smart pointers that can just be replaced with actual smart pointers, or replacing the C string functions with C++'s on string class (and related classes) and functions/methods.

But you won't see that as easily until you start scanning the code. And you won't be able to evaluate the consequences until you push towards more rapid test builds (at least) if not deployment.

jasonwatkinspdx · 2 years ago

Yeah, I've done a fair bit of agency work dropping in to rescue code bases, and the first thing I do is run unit tests and check coverage. I add basic smoke tests anywhere they're missing. This actually speeds me up, rather than slowing me down, because once I have reasonably good coverage I can move dramatically faster when refactoring. It's a small investment that pays off.

dralley · 2 years ago

On the flip side, auto-formatting will trash your version history and impede analysis of "when and why was this line added".

I'm not hardcore on auto-formatters, but I think their impact on code history is negligible in the case of every legacy system I've worked on. The code history just isn't there. These aren't projects that used git until recently (if at all). Before that they used something else, but when they transitioned they didn't preserve the history. And that's if they used any version control system. I've tried to help teams whose idea of version control was emailing someone (they termed them "QA/CM") to make a read-only backup of the source directory every few months (usually at a critical review period in the project, so a lot of code was changed between these snapshots).

That said, sure, skip them if you're worried about the history getting messed up or use them more selectively.

skrebbel · 2 years ago

You can ignore commits from git blame by adding them to a .gitattributes file.

This is assuming Git of course, which is not a given at all for the average legacy c++ codebase.

lpapez · 2 years ago

You can instruct git to ignore specific commits for blame and diff commands.

See "git blame ignore revs file".

Intended use is exactly to ignore bulk changes like auto formatting.

IshKebab · 2 years ago

I believe you can configure `git blame` to skip a specific commit. But in my experience it doesn't matter anyway for two reasons:

1. You're going to reformat it eventually anyway. You're just delaying things. The best time to plant a tree, etc.

2. If it's an old codebase and you're trying to understand some bit of code you're almost always going to have to walk through about 5 commits to get to the original one anyway. One extra formatting commit doesn't really make any difference.

duped · 2 years ago

This is another reason why you should track important information in comments alongside the code instead of trusting VCS to preserve it in logs/commit messages, and to reject weird code missing comments from being merged.

Not saying that fixes decades of cruft because you shouldn't change files without good reason and non-white space formatting is not a good reason, but I'm mentioning it because I've seen people naively belief bullshit like "code is self explanatory" and "the reason is in the commit message"

Just comment your code folks, this becomes less of a problem

mb7733 · 2 years ago

How does reformatting trash the history? It's one extra commit..

I guess if it splits or combines lines that could cause some noise if you really want the history of a single line... But that happens all the time, and I don't see how it would really prevent understanding the history. You can always do a blame on a range of lines.

Maybe I'm missing something though, genuinely curious for a concrete example where reformatting makes it hard to understand history!

PreachSoup · 2 years ago

On per file level it's just 1 commit. It's not really a big deal

exDM69 · 2 years ago

clang-format can be applied to new changes only, for this very reason.

Adding it will remove white space nitpicking from code review, even if it isn't perfect.

thrwyexecbrain · 2 years ago

I would absolutely not recommend auto-formatting a legacy codebase. In my experience large C++ projects tend to have not only code generation scripts (python/perl/whatever) but also scripts that parse the code (usually to gather data for code generation). Auto formatting might break that. I have even seen some really cursed projects where the _users_ parsed public header files with rather fragile scripts.

I was listing the items in the original article's #3 and saying I'd move them up to #2 before I'd go about excising portions of the project, the original #2. I still stand by that. But you can read my other comment where I don't really defend auto-formatting to see that I don't care either way. I made it about four hours ago so maybe you missed it if you didn't refresh the page in the last few hours.

btown · 2 years ago

CI is different from the others, here! At minimum, building a "happy path(s)" test harness that can run with replicable results, and will run on every one of your commits, is a first step, and also helps to understand the codebase.

And you're jumping around - and you'll have to! - odds are you'll have a bunch of things changed locally, and might accidentally create a commit that doesn't separate out one concern from another. CI will be a godsend at that point.

politician · 2 years ago

Nit: The post scopes "tearing things out" to dead code as guided by compiler warnings and unsupported architectures.

If going the route, I'd recommend commenting out the lines rather than removing them outright to simplify the diffs at least until you're ready to squash and merge the branch.

SAI_Peregrinus · 2 years ago

Better to use `#if` or `#ifdef` to prevent compilation. C & C++ don't support nested comments, so you can end up with existing comments in the code ending the comment block.

broken_broken_ · 2 years ago

Fair point!

vijucat · 2 years ago

Not mentioned were code comprehension tools / techniques:

I used to use a tool called Source Navigator (written in Tcl/tk!) that was great at indexing code bases. You could then check the Call Hierarchy of the current method, for example, then use that to make UML Sequence Diagrams. A similar one called Source Insight shown below [1].

And oh, notes. Writing as if you're teaching someone is key.

Over the years, I got quite good at comprehending code, even code written by an entire team over years. For a brief period, I was the only person actively supporting and developing an algorithmic trading code base in Java that traded ~$200m per day on 4 or 5 exchanges. I had 35 MB of documentation on that, lol. Loved the responsibility (ignoring the key man risk :|). Honestly, there's a lot of overengineering and redundancy in most large code bases.

[1] References in "Source Insight" https://d4.alternativeto.net/6S4rr6_0rutCUWnpHNhVq7HMs8GTBs6...

awesomeMilou · 2 years ago

> I used to use a tool called Source Navigator

I can't believe I'm finding someone in the wild that also has used Source Navigator.

My university forced this artifact on me in the computer architecture course because it has some arcane feature set + support for an ARM emulator that isn't found elsewhere. We used it for bare metal ARM assembly programming

>worry not, by adding std::cmake to the standard library and you’ll see how it’s absolutely a game changer

I'm pretty sure my stomach did somersaults on that.

But as for the advice:

>Get out the chainsaw and rip out everything that’s not absolutely required to provide the features your company/open source project is advertising and selling

I hear you, but this is incredibly dangerous. Might as well take that chainsaw to yourself if you want to try this.

It's dangerous for multiple reasons. Mainly it's a case of Chesterton's fence. Unless you fully understand why X was in the software and fully understand how the current version of the software is used, you cannot remove it. A worst case scenario would be that maybe a month or so later you make a release and the users find out an important feature is subtly broken. You'll spend days trying to track down exactly how it broke.

>Make the project enter the 21st century by adding CI, linters, fuzzing, auto-formatting, etc

It's a nice idea, but it's hard to do. One person is using VIM, another is using emacs, another is using QTCreator, another primarily edits in VSCode.. Trying to get everyone on the same page about all this is very, very hard.

If it's an optional step that requires that they install something new (like commit hook) it's just not going to happen.

Linters also won't do you any good when you open the project and 2000+ warnings appear.

electroly · 2 years ago

> It's a nice idea, but it's hard to do. One person is using VIM...

The things the author listed there are commonly not IDE integrated. I've never seen a C++ development environment where cpplint/clang-tidy and fuzzers are IDE integrated, they're too slow to run automatically on keystrokes. Auto-formatting is the only one that is sometimes integrated. All of this stuff you can do from the command line without caring about each user's chosen development environment. You should definitely at least try rather than giving up before you start just because you have two different text editors in use. This is C++; if your team won't install any tools, you're gonna have a bad time. Consider containerizing the tools so it's easier.

throwaway2037 · 2 years ago

> I've never seen a C++ development environment where cpplint/clang-tidy and fuzzers are IDE integrated

CLion from JetBrains has clang-tidy integrated (real-time).

gpderetta · 2 years ago

Clangd will happily run clang-tidy as part of completion/compile-as-you-type/refactor/auto independent.

On the editor/IDE of your choose.

I wouldn't call it fast, but it is quite usable.

chlorion · 2 years ago

In Emacs I have clangd and clang-tidy running on each key stroke!

The project size is probably a lot smaller than what most people are working on though, and I have a fast CPU and NVME disk, but it's definitely possible to do!

I'm not sure about the fuzzer part though.

zer00eyz · 2 years ago

>> It's a nice idea, but it's hard to do. One person is using VIM, another is using emacs, another is using QTCreator, another primarily edits in VSCode.. Trying to get everyone on the same page about all this is very, very hard.

This is what's wrong with our industry, and it's no longer an acceptable answer. We're supposed to be fucking professional, and if a job needs to build a tool chain from the IDE up we need to learn to use it and live with it.

Built on my machine, with my IDE, the way I like it and it works is not software. It's arts and fucking crafts.

cratermoon · 2 years ago

If you're saying everyone should agree on the same IDE and personal development toolset, I disagree, sort of.

The GP was suggesting the effort to add CI, linters, fuzzing, auto-formatting, etc was too hard. If that can be abandoned entirely, perhaps the legacy codebase isn't providing enough value, and the effort to maintain it would be better spent replacing it. But the implication is that the value outweighs the costs.

Put all the linters, fuzzing, and format checking in an automated build toolchain. Allow individuals to work how they want, except they can't break the build. Usually this will reign in the edge cases using inadequate tools. The "built on my machine, with my IDE, the way I like it and it works" is no longer the arbiter of correct, but neither does the organization have to deal with the endless yak shaving over brace style and tool choice.

otabdeveloper4 · 2 years ago

> every single carpenter in the world should use the exact same make and model of saw, for, uh, professionalism reasons

saagarjha · 2 years ago

Software is arts and crafts :)

> One person is using VIM, ...

I don't get your point. You know you can autoformat outside editors right? Just configure pre-commit and run it in CI. It's trivial.

> If it's an optional step that requires that they install something new (like commit hook) it's just not going to happen.

It will because if they don't then their PRs will fail CI.

This is really basic stuff, but knowledge of how to do CI and infra right does seem to vary massively.

j-krieger · 2 years ago

> It's a nice idea, but it's hard to do. One person is using VIM, another is using emacs, another is using QTCreator, another primarily edits in VSCode.. Trying to get everyone on the same page about all this is very, very hard.

I must have missed the memo where I could just say no to basic things my boss requires of me. You know, the guy that pays my salary.

rcxdude · 2 years ago

You can, in fact, just do that. You are allowed. You may get fired if you are too much of a headache, but employers are actually flexible.

As others have mentioned, none of these things actually change your development workflow. But if they did, you do have the ability to say no. If your boss fails to understand that you have an environment that you're productive in, that sounds like a bad place to work.

aaronbrethorst · 2 years ago

An optional step locally like pre-commit hooks should be backed up by a required step in the CI. In other words: the ability to run tests locally, lint, fuzz, format, verify Yaml format, check for missing EOF new lines, etc, should exist to help a developer prevent a CI failure before they push.

As far as linters causing thousands of warnings to appear on opening the project, the developer adding the linter should make sure that the linter returns no warnings before they merge that change. This can be accomplished by disabling the linter for some warnings, some files, making some fixes, or some combination of the above.

Kamq · 2 years ago

Bullshit, all of these (and additionally C lion) are fairly easy to configure these for, with the possible exception of QTCreator (not a ton of experience on my end).

Just make it a CI requirement, and let everyone figure it out for their own tools. If you can't figure that out, you get to run it as a shell script before you do your PRs. If you can't figure that out, you probably shouldn't be on a legacy C++ project.

> It's dangerous for multiple reasons. Mainly it's a case of Chesterton's fence. Unless you fully understand why X was in the software...

If this is a function that no one links to, and your project does not mess with manual dynamic linking (or the function is not exposed), then it's pretty safe to remove it. If it's internal utility which does not get packaged into final release package, it is likely be safe to remove too. If it's a program which does not compile because it requires Solaris STREAMS and your targets are Linux + MacOS - kill it with fire.

(Of course removing function calls, or removing functionality that in-use code depends on, is dangerous. But there is plenty of stuff which has no connection to main code)

mk_chan · 2 years ago

I’m not sure why there’s so much focus on refactoring or improving it. When a feature needs to be added that can just be tacked onto the code, do it without touching anything else.

If it’s a big enough change, export whatever you need out of the legacy code (by calling an external function/introducing a network layer/pulling the exact same code out into a library/other assorted ways of separating code) and do the rest in a fresh environment.

I wouldn’t try to do any major refactor unless several people are going to work on the code in the future and the code needs to have certain assumptions and standards so it is easy for the group to work together on it.

dj_mc_merlin · 2 years ago

The post argues against major refactors. The incremental suggestions it gives progressively make the code easier to work with. What you suggest works until it doesn't -- something suddenly breaks when you make a change and there's so much disorganized stuff that you can't pinpoint the cause for much longer than necessary. The OP is basically arguing for decluttering in order to be able to do changes easier, while still maintaining cohesion and avoiding a major rewrite.

The right answer depends on the future. I've worked on C++ code where the replacement was already in the market but we had to do a couple more releases of the old code. Sometimes it is adding the same feature to both versions. There is a big difference in how you treat code that you know the last release is coming soon and code where you expect to maintain and add features for a few more decades.

Yes, you have to expect the future (or even better if your manager/boss already has expectations you can adopt to begin with) and then choose the right way to tackle the changes required. That's why I laid out 3 possible cases the last of which points out that I prefer to refactor primarily when there's a lot of work incoming on the codebase. Personally, I don't see much value in refactoring code significantly if you alone are going to be editing it because refactoring for ease of editing + the cost of editing in the refactored codebase is often less than just eating the higher cost of editing in the pre-refactored codebase and you don't reap the scaling benefits of refactoring as much. However, like I mentioned in the above paragraph, it depends. In the end it's all about managing the debt to get the most out of it in a _relatively_ fixed time period.

FpUser · 2 years ago

>"When a feature needs to be added that can just be tacked onto the code, do it without touching anything else."

In few lucky cases. In real life new feature is most likely change in behavior of already existing one and suddenly you have to do some heavy refactoring in numerous places.

convolvatron · 2 years ago

if you're going to own it for the foreseeable future. then own it. learn it, refactor it, test the hell of out of it. otherwise you're never going to be able to debug or extend it.

one thing I always do is throwaway major refactors. its the fastest way for me to learn what the structure is, what depends on what, and what's really kinky. and I might just learn enough to do it for real should it become necessary.

summerlight · 2 years ago

This thread has lots of good advice. I'll add some of mine, not limited to C/C++. If you have luxury of using VCS, make a full use of its value. Many teams only use it as a tool merely for collaboration. VCS can be more than that. Pull the history then build a simple database. It doesn't have to be an RDB (it's helpful though); a simple JSON file or even a spreadsheet file is a good starter. There are so many valuable information to be fetched with just a simple data driven approach, almost immediately.

There could be so many other things if you can integrate other development tools with VCS. In the era of LLM, I guess we can even try to feed the project history and metadata to the model and ask for some interesting insights, though I haven't tried this. It might need some dedicated model engineering if we want to do this without a huge context window but my guts tell that this should be something worth try.

bostonvaulter2 · 2 years ago

Nice ideas! Do you have any tips for software to help automate some of those analyses?

bArray · 2 years ago

> 3. Make the project enter the 21st century by adding CI, linters, fuzzing, auto-formatting, etc

I would break this down:

a) CI - Ensure not just you can build this, but it can be built elsewhere too. This should prevent compile-based regressions.

b) Compiler warnings and static analysers - They are likely both smarter than you. When it says "warning, you're doing weird things with a pointer and it scares me", it's a good indication you should go check it out.

c) Unit testing - Set up a series of tests for important parts of the code to ensure it performs precisely the task you expect it to, all the way down to the low level. There's a really good chance it doesn't, and you need to understand why. Fixing something could cause something else to blow up as it was written around this bugged code. You also end up with a series of regression tests for the most important code.

n) Auto-formatting - Not a priority. You should adopt the same style as the original maintainer.

> 5. If you can, contemplate rewrite some parts in a memory safe language

The last step of an inherited C++ codebase is to rewrite it in a memory safe language? A few reasons why this probably won't work:

1. Getting resources to do additional work on something that isn't broken can be difficult.

2. Rather than just needing knowledge in C++, you now also need knowledge in an additional language too.

3. Your testing potentially becomes more complex.

4. Your project likely won't lend itself to being written in multiple languages, due to memory/performance constraints. It must be a significantly hard problem that you didn't just write it yourself.

5. You have chosen to inherit a legacy codebase rather than write something from scratch. It's an admittance that you don't have some resource (time/money/knowledge/etc) to do so.

jpc0 · 2 years ago

> The last step of an inherited C++ codebase is to rewrite it in a memory safe language

Simply getting rid of any actually memory unsafe C++ and enforcing guidelines will do this for you in the C++ codebase.

"Rewrite it in X" only adds complexity because it's the flavour of the month as you said in your comment.

Author is already doing the work of rewriting large chunks of the codebase in C++, they may as well follow and implement a more restrictive subset of the language, I find High integrity C++ to be good. If I can get my hands on the latest MISRA standard that is likely good as well. These may not be "required" but they specify what is enforced in <enter "safe" language here>. So instead of having to reskill your entire devteam on a new language which has many many sharp edges, how about just having your dev team use the language they already know and enforce guidelines to avoid known footguns.

dieortin · 2 years ago

How would you get rid of any memory unsafe C++? Isn’t that just another way of saying “do not make mistakes”?