Readit News logoReadit News
Posted by u/drekipus a year ago
Ask HN: Advice for leading a software migration?
Hey HN,

I'm about to take lead of a decent sized software migration at work. (From V1 of some subsystem, to v2, both in house. We want to deprecated and eventually remove V1 totally) For 8 of our clients, totalling about 16 million customers.

I don't have too many details to share, as I don't know what's relevant. But I'm asking if anyone has any advice or recommended reading regarding such?

One book that is really inspiring me about it is "how big things get done" by Bent Flyvbjerg and Dan Gardner. In it, there's some key bits of advice such as

* Think slow, ask fast, and mitigate long tailed risks.

* Compartmentalize and stick to repeated processes. "Build with LEGOs"

* Look around at other projects of similar nature.

The last point is why I'm here, as I know some of you have been in the game for longer than I have, so feel free to share experiences that you might think is relevant, if you'd like.

philip1209 · a year ago
I’ve done this. My quick thoughts:

- migrations always run longer than expected. In my case, leadership estimates were off by a factor of 10. What the eng manager originally said would take 3 months ended up taking a couple years.

- try to deliver quick wins and incremental value. This is often hard though. But it’s worth a try.

- Try to avoid this becoming the project everybody attaches their pet projects too. It’s too easy for people to make this the project where they use that new framework, test well, set up a design system, and make lots of little changes.

- that being said: migrations are easiest if you keep the design (visually and engineering) exactly the same. There will be lots of pressure to “just redo it while you’re already having to rewrite it”, but the uncertainty of a redesign really slows things down. Having a reference implementation means you don’t have to invent tons of acceptance criteria from first principles.

- as soon as things start getting delayed, which they will, try offering to cut corners or cancel the project. You want somebody else in corporate to stick their neck out to extend the project.

- Try seeding the team with more veteran ICs internally. You’ll need their help as you uncover dragons or need to get other teams to help run or integrate your new code.

- Among projects I’ve seen like this, the person running them gets fired or quits partway through at least half of the time. This is often because some middle manager made a promise they couldn’t keep to executives, and needs a scapegoat to save their own job. (It’s often that kind of middle manager who switches jobs every two years and keeps failing up silently and the project delay happens halfway through their stay at the company and they’re just trying to get to the two year mark and quit before anybody realizes what is going on internally.)

sjf · a year ago
I support everything in this comment.

After more than a decade at large sw companies, I can count on one hand the number of migrations where the legacy system was ever able to be turned down. I’ve seen migrations drag on for years, to the point where most of the team has turned over. I’ve seen them become a three-way migration because the second version was deemed insufficient so a third solution was introduced.

Absolutely put your most senior devs on this; maintain as much support from management as possible; budget for much, much more time than you think; you need full commitment or you are going to be maintaining both systems indefinitely.

stevage · a year ago
Do senior Devs actually want to work on such a thankless project?
toast0 · a year ago
> After more than a decade at large sw companies, I can count on one hand the number of migrations where the legacy system was ever able to be turned down.

If part of the plan wasn't to run a v1 shim on top of v2 to handle legacy users that won't migrate, v2 almost certainly doesn't meet the needs of v1 customers and it's not a question of 'migration' it's a question of ending a product and releasing a similar product.

Sometimes that's what's wanted and needed, but often it's not, and then it's a surprise that the v1 users want their needs met and it's hard to say no to paying customers, but nobody signed up to run two products forever.

gofreddygo · a year ago
I've done this too. Although not at the "millions of clients" scale, but large enough to drive learnings. Everything above is true.

Migrations are painful, thankless and always run over budget and time. Unless I've been at the company long enough, have enough confidence and rapport with my reporting head and skip level, I'd rather not do it.

I'm never taking any big (more than 2-3 month) migrations. Only small predictable subsystems that i can rollback or run both v1 and v2 in parallel. First one third time is for discovering by making changes and seeing where things break and possibly come up with fast tests (manual or automated). Last one third is for actual testing, trying out small pieces in production and fixing unexpected issues. So take your dev estimate and multiply by 3.

Even then, you have to shoot down any demands to use new frameworks, new processes and new dependencies. And resist your own temptation. Remember no one gives a shit about migrations.

You will be asked a thousand times on the progress by people incapable of fathoming the complexity. They expect a percentage. Have one ready with a small roadmap, of cornerstones and publish as a report or something. Everytime someone asks, point to the report. No one ever opens that report.

philip1209 · a year ago
One delayed follow-up thought here:

Redesigns almost always result in a decrease in metrics/KPIs. The redesign just lacks the learned improvements that were baked into the old product. So, the initial launch almost always seems like a failure - and requires leadership to expect this dip before problems can be patched.

drekipus · a year ago
Thanks for the follow up. I'm just making notes from this thread now and found that you only posted this 13 hours ago. :)
frenchie4111 · a year ago
> the person running them gets fired or quits partway through at least half of the time

This is a good point. Or the migration appears to have been very successful to management (before it's actually complete from an engineering perspective) and they get promoted / moved onto higher priority work.

Either way: make sure you are keeping the rest of the relevant engineering organization informed about how the new system works and how the migration is going to work.

philip1209 · a year ago
I don’t think there’s much room for promotion because migrations are fabrication and promotions favor innovation. It’s ability to save money versus ability to make money. See: Smiling curve in economics.
al_borland · a year ago
If at all possible, try to find a way to do it incrementally, with options to roll back if things go sideways when something is released.

Management rarely wants to wait years for before seeing any pay off from a big dramatic cutover, and big sweeping changes are disruptive to clients.

This will likely create more work. Maybe some layer has to be built to allow v1 and v2 subsystems to both operate with the other parts of the app. But it should ultimately make it less stressful.

If you can allow some friendly departments from friendly clients to test and provide feedback before rolling it out to the whole company or the full set of companies, that would probably go a long way to help identify blind spots.

Most importantly, listen to your team and the people who know the systems well. The projects I’ve seen that have really gone sideways are ones where the people who know the true issues are never consulted, or completely ignored when they try to raise an alarm.

bearjaws · a year ago
> try to find a way to do it incrementally, I would make it a hard requirement.

If you can't do it incrementally, it's going to fail. Corporations rarely have the attention span and staff tenure to make that kind of migration work.

Even if it takes a year of pre-work to get to a point where it can be done incrementally, it will be the only way it gets done.

snarkypixel · a year ago
One thing I've learned from these large migration projects is that v1 always seems like total crap, while v2 appears to be the perfect dream. However, as you begin building v2, you start to realize that v1 was not actually that bad and had many great but unappreciated features. Additionally, you come to understand that many v1 features took a long time to develop, were battle-tested, and would require significant effort to rebuild in v2 with minimal benefits.

So, what I've learned is not to completely discard v1. Instead, it's better to refactor or rebuild only the parts that pose issues, even though it may not be as sexy or exciting as starting v2 from scratch.

In practice, I would begin by cloning v1 and deploying it to a development environment to start tweaking it. I would also ensure to implement numerous automated tests to safeguard against any potential issues caused by refactoring. Of course, if you can keep using the same database that's even better as you can test refactored features with real customer data and even run both builds in parallel to spot any differences.

hoofhearted · a year ago
Follow the strangler fig pattern, and map out every single task that is required in the migration on a whiteboard.

Write tests if you can, and set up a staging environment for V2 that you can setup and tear down easily for battle testing way before going live.

From there, break the tasks up from above into their business domains, and abstract those into new api services that the v1 system can use without any downtime.

For a frontend migration, that’s a whole different story and you would have to provide more details such as “moving from legacy Angular 1 to React 18 while it’s running”.

hoofhearted · a year ago
smarri · a year ago
I second this
madduci · a year ago
> Follow the strangler fig pattern, and map out every single task that is required in the migration on a whiteboard.

> Write tests if you can, and set up a staging environment for V2 that you can setup and tear down easily for battle testing way before going live.

I've successfully helped migrate a critical project and followed exactly this strategy. Older versions were being developed and run 1:1 in parallel to the newer ones until the customers got only a small downtime due the change of IP Addresses were the system was running

drpossum · a year ago
This is a good answer and one I've put into practice successfully more than once. Automated tests are very key here.
daviddever23box · a year ago
Listen to the data that you're migrating from one system to another, so to speak. Test v1-to-v2 and v2-to-v1 migrations until you're blue in the face. Feature-flag migrations for individual clients. Ensure that any SLAs are met with v1-only, v1-in-flight-to-v2, v2 only, and/or some mix of static partial migration. Make sure that you have an absolutely homeomorphic mapping of data from one representation to another.
robviren · a year ago
Id immediately set the expectation that the process will be messy, take longer than expected, and require continued maintenance, iterations, and process improvements. Management usually tries to sell a transition as being great for everyone and will solve all problems. When it usually ends up being awful, painful, and take incredible effort. Disappointment is always better the sooner it is communicated. Align in principal for why an effort must happen and the realistic benefits to their daily life. Don't sell them a fairytale. I've found every transition is nost painful because expectations and communication is poorly managed.

I don't blame people. Usually the offenders are in a culture where telling the truth is unpopular. It just depends on if you want to have a successful transition, or make people feel good about a project that takes 6 years to not finish.

Nathanba · a year ago
I would strongly disagree with that, do not go into a migration with the expectation that you'll impact people. If you do that, you'll take shortcuts, you'll start thinking in the wrong ways. Suddenly you'll start saying to yourself that the migrated customers should be able to live with X or Y or that your colleagues have to accept that they have to do these various steps because hey, we are doing a migration after all. Instead it has to retain the exact same behavior at all times. It should cause zero pain whatsoever, if it causes pain you failed at your migration task. Secondly I agree with the other poster that it has to be incremental, otherwise you might as well accept a monumental amount of bugs from the start. My third point is that you should automate as much as possible and write code to do the migration in a repeatable way, first on testdata, and then keep expanding the type of testdata until it encompasses all the possible data that customers can have. Then you run that migration on the press of a button and it should work perfectly every single time you do it.
basseq · a year ago
I'm at the tail end of two of these, of ~10 in my career. They are always tough, always a bit of chaos, and all different.

Planning is important, and avoid committing to targets or deadlines until you have your arms wrapped around what needs to be done. This can be wide-ranging, and include: product parity, contract management, internal asset development (project plans, test suites, customer training, etc.), customer change management, and team throughput.

You have few clients but large impacts. You likely want to pick the friendliest one and give them generous terms to be the "test case". Expect it will take 2x longer than your estimate.

Do as much work on parity as you can: what are the differences between v1 and v2, and how will you bridge them? If data migration is involved, you will need tooling and team training.

Inevitably you will find that customers move slower than you like and are using v1 in ways you did not expect.

jiggawatts · a year ago
Day #1 of any N-month long migration/rewrite project I've participated in:

PM: "Fill out this spreadsheet with key dates leading up to the project completion."

Me: "First, that's your job, not mine. Second, I literally just got here, I haven't even drunk my coffee yet. Hi, my name is Jiggawatts. I've only just heard of this software we're migrating ten minutes ago."

PM: "Yes, yes, but the customer asked me for cost estimates and timelines."

Me: "I asked for a Lamborghini packed with supermodels, but I didn't get that either. Tough break, huh?"

PM: "It's not an unreasonable request!"

Me: "Without time machines and/or a magic crystal ball, it is. Do you have a time machine?"

Etc...

We all recognise this, and it's a symptom of an underlying problem.

Really, what ought to occur is incremental progress and demonstrable deliverables. If you go off into a cave for two years and come back with something the customer doesn't like, then you've caused a business catastrophe.

I've found that businesses and customers in general prefer incremental improvement. One trick in .NET land is to use something like YARP[1], which lets you totally rewrite the app... one web page at a time.

Another management trick on top of that is to not demo the last few steps. Complete the last few milestones of the project quietly, without reporting this up until the very end. I guarantee you that everyone in charge of the budget thinks they can "save money" by skipping the "last 10%", even though that results in 2x the ongoing complexity because it means the legacy components must still remain live and deployed to production.

I guarantee that the only way to prevent this is to lie to management. It is biologically impossible to insert these concepts into the brain of a non-technical manager, so don't even try.

[1] https://microsoft.github.io/reverse-proxy/

mmaarrccoo · a year ago
I led a painful migration a couple of years ago and can share some tips.

It's not clear whether v2 is already in production somewhere else. If it is not, you better wait until 1) the v2 data model has really been finalized and in prod and 2) key resources can be made available to the migration team. We were forced to begin the migration before the new product was complete and it was just plain impossible. We had to start all over every quarter.

- Migrations are very difficult to estimate. Any optimistic estimate will bite back. Hold off as much as you can, and ensure appropriate buffers if you really have to.

- ensure that the 8 clients have an identical v1 data model (tables, constraints, etc). If that is not the case, remember you will run n migrations, not 1.

- You need a team with knowledge of both v1 and v2 data models, as well as business domain know-how. There are many decisions that need to be made and you need the right people to be around.

- Not everything has to be migrated. Trying to migrate 100% is a common mistake: engage with the customers to understand what's the minimum that legally and operationally has to be migrated, especially if the v1 system has been in production for many years.

- Data migration is a iterative process, and the last thing you want to is to manually QA every iteration. You need to develop tests that will provide a reasonable data integrity assurance.

- Dashboards showing data migrated, failing/ok tests, remaining tables, etc. help communicate status and track progress.

- Customers will need to be involved during the whole project. You need them to commit to making people available that can quickly answer questions to unlock you dev teams. ideally, you want to create a single team. Make sure that decisions are traced and versioned.

- Performance matters. Discuss the performance requirements upfront. Our process was very, very slow and we found out a bit too late that the customer would not tolerate such down time. Also, discuss "when" ok to migrate, how to rollback in case of failure, etc.