Readit News logoReadit News
Posted by u/factorialboy 5 years ago
Ask HN: What are the pros / cons of using monorepos?
Or in other words, when would you recommend using them, and when would you avoid them?
gen220 · 5 years ago
I've worked in environments across the version-control gamut. The best-run places have had monorepos. But, for the life of me, I would not trust the companies that didn't have monorepos, to operate a monorepo.

To go mono is to make an org-level engineering/cultural commitment, that you're going to invest in build tools, dependency graph management, third-party vendoring, trunk-driven development, and ci/cd infrastructure.

Can you make a mono repo work without all of those things? Yes, but you are sacrificing most of its benefits.

If your eng org cannot afford to make those investments (i.e. headcount is middling but there is zero business tolerance for investing in developer experience, or the company is old and the eng org is best described as a disconnected graph), forcing a monorepo is probably not the right idea for you.

Monorepo vs microrepos is analogous in some ways to static vs dynamic typing debate. A well-managed monorepo prevents entire classes of problems, as does static typing. Dynamic typing has much lower table stakes for a "running program", as do microrepos.

edit:

It's worth noting that open source solutions to build tooling, dependency graph management, etc. have gotten extremely good in the last 10 years. At my present company (eng headcount ~200), we spend about 2 engineers-per-year on upgrades and maintenance of this infrastructure. These tools are still quite complex, but the table stakes for a monorepo are lower today than they were 10 years ago.

voxl · 5 years ago
It's strange that you make this incorrect comparison to static and dynamic typing. As with static typing, an automated tool prevents you from doing something. It comes for free at the cost of you needing to satisfy the tool. Where is this in a monorepo? You make it sound like the company itself must invest in assuring that the rules are not broken. That sounds like _dynamic_ typing to me.
gen220 · 5 years ago
> Where is this in a monorepo? You make it sound like the company itself must invest in assuring that the rules are not broken.

My argument, as made in the original comment, is that a monorepo is not a git repository. It is a git repository, a build tool / dependency graph manager, and a continuous integration system.

In the same way, a statically-typed project is not a collection of source files. It is a collection of source files, a linker, and a compiler.

The linker and compiler do not come "for free" any more than a build tool / ci system do, although they have been around for a very long time so you might consider them part of the background. In a well-structured monorepo, the build tool and ci system are as much in the background as `gcc` or `go build` might be.

triceratops · 5 years ago
> As with static typing, an automated tool prevents you from doing something. It comes for free at the cost of you needing to satisfy the tool. Where is this in a monorepo?

In a monorepo if your change breaks someone else's code, you'll know immediately because you won't be able to check in the code with a breaking build. If you have the correct monorepo tooling set up it will help you figure out what you're breaking.

With multi-repos others will have to upgrade to your new version and then see that their code is breaking.

GP's analogy was imperfect because as you pointed out the company has to invest in the monorepo tooling. But it makes sense to me.

qznc · 5 years ago
What would be a scalable open-source monorepo stack? Git, Gitlab, Bazel?
KerrickStaley · 5 years ago
Git + Bazel will be fine for scaling up to 100s of engineers from my experience (at Lyft's L5 autonomous division). My other data point is Google, with 10,000s of engineers and a bespoke VCS. I'm not sure how things work in the middle (1000s of engineers), but I think you can solve that problem when you get to it, and Bazel has some features (look at the git_repository rule) to help you split a big repo if you need to.

You may also want a service like Artifactory for hosting binary blobs that feed into the build process.

gen220 · 5 years ago
This is more or less what we use at my current place (100s of engineers). We haven't run into any dramatic issues at this scale.

Our architecture is arcanist/phab over git --> stash (git) --> custom build server that's basically identical to gitlab --> artifactory.

we use pants, but we made that choice before bazel was a thing.

Bazel is probably the most "scalable" solution, but it isn't always the most intuitive/developer-friendly. It's worth exploring the alternatives (biggest ones imo are pants, buck) to see what you like the most. They're all pretty similar, but Bazel has some hard-to-replicate bells and whistles that mean it'll probably be the eventual winner.

joshuapark · 5 years ago
Things that change together, go together. Monorepos prevent people from having to dig in all imaginable places of your version system to find all pieces of your application. At the same time, if you have 10 micro services which are accessed by 1 frontend, it may be a little bit messy to keep all that code in the same place.

Common sense (which is not that common) is what should be used to determine. Ask yourself some questions:

- Does these repositories change together ? - If the code is not together, one repo should be linked to the over via what (git tag ? version number ?) - All pieces of the application are contained within a context ? Or they can be split (example, 10 micro services and 1 frontend, but 5 of those micro services are also used by another frontend in another project).

The main goal is to make things easier, some PoC and experiments may make it more clear because it really depends on the situation.

cies · 5 years ago
> Things that change together, go together.

Yes!

> if you have 10 micro services which are accessed by 1 frontend, it may be a little bit messy

I see another differentiator here. If I have a typed API interface (both endpoints and payloads), and I publish a (generated) client library for the frontend, then that frontend will only use the new API version if it is also upgraded to use the new client lib. Here I think multiple repos are more suitable because the pieces of code can improve independently.

In case the API is not typed/versioned, the case for a monorepo is much more important: the change to the API on the BE should come along with the change in the FE.

Thus typing/versioning (on specific barriers) helps with breaking up a code base (along those barriers).

taeric · 5 years ago
Even with untyped, I'd argue that you want it to be in your face that you cannot change the front and the back ends together. Getting them in one commit will not get them in one deploy.
nendroid · 5 years ago
>At the same time, if you have 10 micro services which are accessed by 1 frontend, it may be a little bit messy to keep all that code in the same place.

Why is this messy? Using folders to separate your code has no intrinsic organizational difference than using an entire repo. There's no issue in throwing every app in your company under one repo and just using a folder to organize your stuff.

Separating code into multiple repos makes one repo less aware of changes in another repo. It actually makes things harder and worse.

This is the same issue with monoliths and microserves. You don't need to separate your code into several computers just for organizational issues. You can use folders to organize things.

lallysingh · 5 years ago
Pros:

* Single version / branching for everything

* Commits that go across components/apps are atomic.

Cons:

* When it gets big, those features matter less

* Churn from other dev's stuff gets in your merge/rebase work.

* 'git log' and other commands can be painfully slow

* Mistakes in the repo (e.g., committing a password) now affect many more people.

Use for highly-coupled source bases. Where releases together and atomic commits are very useful. Every other time, you're probably better off splitting early and taking a little time to build the coordination facilities needed for handling dependency versioning across your different repos.

Note: I used to really prefer monorepos, but I've had some time away now to take a better look at it. Now they feel like megaclasses where small things are easier (because everything is in one place) but large things are way harder (because there's no good split point).

ht85 · 5 years ago
> Churn from other dev's stuff gets in your merge/rebase work

Wouldn't other people' work only cause issues if they are changing the same files, in which case conflicts would happen even if the work is spread in multiple repos?

GuB-42 · 5 years ago
Here is the problem:

1- You finished your work, so you pull from the central repository, and merge your your branch to the master

2- You do your merge work, test it a bit and commit

3- Now, you push and oops, another team did the same, you are now left with to heads

4- You merge or rebase the two heads. Probably a simple task, but you may still need to run some tests again. And if you are lucky, you are done, otherwise, back to step 3

Also, if you are rebasing before push, you should be able to keep a clean history. However, if you are merging, and there are good reasons for a "merge only" policy, you are going to have a mess of merge commits every time the previous situation happens.

That's something you can work around with good management. But the more freedom you give individual teams to push to the common branches, the more that situation will arise, the more you try to control access, the more chance you will have for teams to go in different directions, making the merges infrequent but tricky.

jayd16 · 5 years ago
I think it can still happen though.

Imagine you're in a branch for large project P. You want to merge to trunk but there are conflicts. The upstream library project L was changed in your branch but not in merged down by the submitter.

Even if P simply relies on a binary of L that is already in the artifact store, you still have to deal with merging this code down when normally you wouldn't.

In fact, the much bigger issue is that you now must deal with the double edged sword of updating the entire org when a shared dependency must be updated.

Perhaps you even completed that task in your project branch. Now you get to deal with merging into every project in the org that was touched.

lallysingh · 5 years ago
Good point. I've gotten hit by this in monorepos but I probably just tried to merge instead of rebase. Both are crazy painful on large source bases.

It still takes forever!

I should replace that with 'most commits in history are irrelevant to you, so you have to dig smarter to understand what's been going on with your components'

pletnes · 5 years ago
Github (might depend on settings) refuse to merge if your branch is out of date. Rebasing on master re-triggers CI/CD. Might cause lots of waiting for this case...
grey-area · 5 years ago
yes, this isn't really an objection.
ericbarrett · 5 years ago
Good list. I’ll add: If your company does feature freezes or general code freezes around holidays or corporate earnings, it can be harder to get “run the business” changes approved during these periods in a monorepo. So even in monorepo companies it’s common to have a separate “infra” repo for network and cloud configuration, sometimes for auth too. Makes the politics easier.
dcolkitt · 5 years ago
For me the biggest pro and con are the same thing. With a monorepo your respective projects' codebases become tightly coupled.

Why that can be a bad thing? Good software engineering practices tend to be associated with loose coupling, modularity and small, well-defined interface boundaries. Putting each project into its own repo, encourages developers to think of it as a standalone product that will be consumed by third-party users at arms length. That's going to engender better design, more careful documentation, and less exposed surface area than the informal cross-chatter that happens when separate projects live in the same codebase.

Why that can be a good thing? The cost of that is that each project has to be treated as a standalone project with its own release schedule, lifecycle support, and backwards compatibility considerations. Let's say you want to deprecate an internal API in a monorepo? Just find all the instances where it's called and replace accordingly. With a multi-repo it's nowhere near as easy. You'll find yourself having to support old-style calling conventions well past the point you'd prefer to avoid breaking the build for downstream consumers.

hinkley · 5 years ago
If a monorepo is generating multiple binaries, there’s no reason it can’t use separate compilation units. In many languages, separate compilation units give you that arm’s length separation you need, without introducing the problem of making breaking changes that you can’t reasonably detect prior to commit, because the code is used in some obscure module you don’t even have checked out.

For that reason, monorepos favor the new programmer, which makes it easier to ramp your team size.

fetbaffe · 5 years ago
It usually begins as

let’s split our project into multiple libraries so it can be reused within the company or even better if we put it on GitHub everyone can use it.

After a few months you notice nobody within the company cares about your nicely versioned libraries and on GitHub you get more and more complaints that the libraries are very limited and need more features.

After that you merge everything back into your monorepo and try to forget all the time wasted on git bureaucracy, versioning, dependency handling and syncing changes between repos.

dehrmann · 5 years ago
Monorepos:

- Work best when you have an open culture

- PCI compliance will be annoying

- The obvious--everything in one place

- Need good tooling around keeping master building

- As they grow, become an uphill battle to use with an IDE

- Test are likely to slow down as the repo grows, so tooling around tests

- Usually lead to a rats nest of dependencies

- Third-party library upgrades can be painful

- Coupled with CD (and it really needs to be coupled with CD), it's easy to get surprise breaks

Multirepos:

- Every team will need to dabble in build and release engineering

- Changes across repos are slow and painful (I claim this is a feature because it makes you think about versioning and deployment)

- Library developers have to think more about versioning

- You'll probably need a binary repository like Artifactory

- More time and tooling needed to do library upgrades (especially interesting for security issues)

- Harder for people to switch teams

user5994461 · 5 years ago
>> - PCI compliance will be annoying

I have to laugh at that. The two biggest banks Goldman Sachs and JP Morgan are heavily mono repo.

It actually makes all certifications and auditing easier. The shared tooling/platform can be checked, everything else can ride on it. Half the questions of certifications are about tracking changes... easy when it's all tracked by the repo.

gen220 · 5 years ago
Can you expand your comment on PCI compliance?

We've had to go through similar-to-PCI compliance hoops for our monorepo, and settled on a solution that didn't degrade the median developer's velocity too much.

I'm curious to know what other monorepo companies had to go through to satisfy the compliance people.

gregcoombe · 5 years ago
Google posted a paper detailing their reasons for choosing monorepo: https://research.google/pubs/pub45424/

Caveat: your company probably isn't Google, so your challenges may be different.

nimblegorilla · 5 years ago
Thanks for sharing that. It's mind-blowing that Google runs everything from a single repo. Most places that I work with create several repos for just one project.
gregcoombe · 5 years ago
Yeah, it was a significant infrastructure cost. I heard at one time that the single largest computer at Google was the Perforce server. They ended up completely re-writing it (called "Piper") for scaling. This is sort of what I was alluding to with my comment about Google having different concerns than many other companies. They can afford to dedicate a number of engineers to maintaining a monorepo system and then re-writing when it doesn't scale. That said, I personally believe that there are a lot of benefits to monorepos, and I think those tradeoffs are worth it for other companies too.
scarmig · 5 years ago
It depends. At a high level, "at scale" you'll have to solve all the same problems for both, to the point where you have a dedicated team or teams solving those problems. Monorepos don't automatically solve issues of version skew or code search or code consistency, and multirepos don't automatically solve problems of access control or performance or partial checkouts. All a monorepo strategy does is say that all your source files will share the same global namespace, and all a multirepo strategy does is say that they can have different namespaces (often corresponding to a binary or grouping of closely coupled binaries). Everything after that is an orthogonal concern. As far as it goes, conceptually monorepos appeal to me, and they offer more discoverability and a simpler, more consistent interface than multirepos. It's also worth considering that there must be some kind of trade-off if you need to pull in the abstraction of "separate repos" to handle code: typically you have fewer guarantees about the way source files will interact when they're in separate namespaces, which makes some things harder.

But if you're just starting out, you're going to be going with off-the-shelf components. Usually this is git hosted on GitHub, GitLab, or something similar; there's a good chance you're going to be using git. Vanilla git works sub-optimally once you reach a certain number of contributors, a certain number of different binaries being generated, and a certain number of lines of code, as a lot of its assumptions (and the assumptions of folks who host git) focus on "small" organizations or single developers. You aren't going to have a good time using a vanilla git monorepo with tens of millions of lines of code, and hundreds of developers, and dozens of different projects, even though in principle you could have a different source control system that would function perfectly well as a monorepo at that scale.

My general approach would be to start with a git monorepo, do all development within it, and once that becomes a pain point migrate to multirepo.