Cool. Could someone, maybe an ex-googler, comment on which parts of these work well and which don't?
A lot of other companies get into trouble trying to cargo-cult what Google does when they are operating in very different environments wherein those practices aren't optimal. E.g. different levels of scale.
Additionally, critics of Google may point out that their engineering culture may not be great on its own terms -- every time Google launches a new feature, people post links to the Google product graveyard.
In my ex-Google experience, here are the stages of denial about something that Google does which is good but the industry doesn't yet embrace.
Stage 1: "We're not Google, we don't need [[whatever]]";
Stage 2: Foreseeable disaster for which [[whatever]] was intended to address happens;
Stage 3: Giant circus of P0/SEV0 action items while everyone assiduously ignores the [[whatever]];
Stage 4: Quiet accretion, over several years, of the [[whatever]] by people who understand it.
And the [[whatever]] ranges from things that are obviously beneficial like pre-commit code review to other clear winners like multi-tenant machines, unit testing, user data encryption, etc etc. It is an extremely strange industry that fails to study and adopt the ways of their extremely successful competitors.
Strong disagree. In my experience, this is not commonly why competitors don't adopt Google's practices. The main reasons I've seen are:
1. Money. Google essentially has a giant, gargantuan, enormous, bottomless pit of money to build a lot of this tooling (and also to take the risk if something ends up not working out). I think you might be able to say that other companies are just being short sighted if they don't implement some of these things up front, and that may be true, but (a) that's pretty much human nature, and (b) given that very few other companies have a bottomless pit of money like Google, that may just end up being the right decision (i.e. survive now and deal with the pain later).
2. Talent. This is closely related to #1, but few other companies have the engineering talent that Google does. If there is one thing I've seen with my experience with ex-Googlers is that most of them are fast coders. So when you go to your boss and say "I'd like to implement engineering/tech-debt improvement XYZ", at other companies it's a harder decision if (on average) it would take 9 months to implement vs. 2 or 3.
3. Related to both of the above, but your 4th bullet point, "Quiet accretion, over several years, of the [[whatever]] by people who understand it.", is actually other companies just waiting for more evidence to see what "shakes out" as the industry-standard, optimal way to do things.
4. Finally, your stage 1, "We're not Google, we don't need [[whatever]]" is actually true in tons of cases. Many of Google's processes are there to handle enormous scale, both in terms of their application/data capacity, as well as the sheer number of engineers they need to coordinate. Very, very, very few companies will ever hit Google's scale.
Off-topic for this thread, but one of the most poignant quips I remember about Google culture was that the performance-review process was really good at rewarding hard, challenging work that didn't produce much value and not very good at recognizing work that produced lots of value but was not astoundingly difficult. I think you were the one who first noted this.
The tools, design and manpower needed to build a skyscraper are different from those needed to build a 1-story wood house. It's not that the ones that build the wood house are failing to study and adopt the ways of their extremely successful competitors.
Now, some of the things you say like unit-testing and user data encryption are ones that I've never seen associated with the "We're not google" mindset, so maybe people have started using that phrase for anythingnow
There is risk of selection bias here. The companies that runs into [[whatever]] are the ones that made it far enough to have run into it. What you're not seeing are all the companies that tried to do what google does at scale, built a complex code base that doesn't serve it's customers needs and can innovate fast enough and are now dead.
Unless you're referring to automated precommit hooks, this sounds baffling. What's wrong with reviewing pull requests? What if I want to push a WIP while I switch to another branch, I still need a review? Is the final PR reviewed again at the end?
"Readability" works terribly when your company is acquired and your team enters all at the same time.
Google has (or had ~10 years ago), a thing called "readability" for each language, where in order to be allowed to commit code to the central "google3" repo, you needed to have written some large amount of code in that language, and needed to have a readability reviewer sign off on your code. The process is designed for slowly on-boarding junior people into a team, and introducing them to google coding style and practices. Eg, the senior, mentoring folks on the team do the reviews and bring the new person up to speed. I imagine it must work well in that context.
However, this breaks down when your entire team is new. How do you find somebody to review the code? All several million lines of the product that was acquired? Especially when it is written in multiple languages.
So we were basically locked out of the main corporate repo, unable to do anything productive. We finally figured out that there was a paved path with a git repo used by the kernel team (and android?) that had none of these hurdles, where we could put our code and get productive immediately.
"Readability" is very much still a thing. It's a mess and would be one of the worst things to take from Google. If you can't enforce the code style you like through autoformatters and linting, it's not worth enforcing.
The way it's supposed to work is that acquired teams get lots of support on integration, including readability. This helps your team get integrated into writing Google-style code. Not sure why that didn't work out in this case?
"Readability" requirement is still a thing, but it isn't for every single piece of code in G3, and I haven't worked close enough to it to think about the exact mechanism of how it applies.
My previous team - pretty much any python submission was hitting me with a python "readability" requirement, and it was a bit painful, because only a single person in my entire group of teams (roughly 15 people total) had the "python readability expert" status. My current team - already submitted quite a few significant C++/TS/Java pieces of code to G3, and not a single "readability" requirement triggered.
In my opinion, the monorepo, global presubmit, testing culture and the beyonce rule (if you liked it then you should have put a test on it) are basically a superpower for infrastructure teams. Without these things it'd be utterly impossible for certain kinds of infra refactors to be done and many more would be very very painful.
In the open source world I see a fair amount of "tests are always red, don't worry" and "we can never edit this interface because who knows who it breaks." These problems aren't intractable at Google.
This approach does have its own set of challenges and I do suspect that the monorepo has contributed in some ways to Google's inability or refusal to maintain some older products. But holy cow the ability to do something like move everybody in the company to different vocabulary types is powerful.
On the other hand, most weeks someone else breaks my system and I have to track down the culprit.
Google's emphasis has always been to make things easy for library developers, at the expense of library clients. For people who value backwards compatibility over long timespans, Google's practices could be better.
I dont think its fair to classify code review and test coverage as “the google way.” Should evaluate more by the unique things google does or the things they specifically invented (not code review and testing).
And of course volunteers working on open source projects have lower standards. Lets instead compare Google to companies which say “we arent google.”
Engineering culture has somewhat collapsed at Google. The things that made engineering great didn't really survive the last couple rounds of internal coups.
Interesting -- having not had any experience inside Google I'm having difficulty painting a picture, could you give an example or two of some of these internal coups?
The “policies don’t scale well” section is inaccurate.
There are plenty of policies floating around that don’t scale well, and plenty of migrations that are still forced on internal users rather than handled magically by the migrating team. The reality is that Google is such a big company most of these fly under the radar of whichever person actually enforces these policies, and it becomes a whole thing to escalate up to whoever enforced them, and then there’s potentially a political battle between whatever director or VP is in charge of the bad actors and the enforcer (ideally they get away with not allocating HC to the internal migration and amortize it across all their users, so that HC can work on flashier stuff).
I think one reason Google has a proliferation of bureaucracy and red tape is that they do not “review” postmortem action items very formally. They are only reviewed as part of the larger incident postmortem review process and the tooling is way overengineered such that performing that review beyond a perfunctory once over isn’t easy to do. So you end up in a situation where “we need to do something” and whichever person handled the incident has to suggest a way to make sure it doesn’t happen again - the easiest of which is to introduce some CYA process. The other reason is that non-coding EMs introduce processes to show some kind of impact on their team.
Also, the existence of the monorepo, global test runs, forced migrations, etc makes it so maintaining a mothballed project incurs some inherent engineering costs - IMO it’s a non-negligible reason Google kills products that could instead simply exist without changes. It also makes it so Google doesn’t really “version” software generally speaking.
DISCLAIMER 1: Current Googler here, but opinions are my own.
DISCLAIMER 2: I think from a hands-on-keyboard SWE there is a lot of useful stuff. What you mentioned about Google culture of killing products and such I am not gonna talk about.
I recommend chapters about testing first and foremost. Among all the codebases I saw (both OS and proprietary) Google tests are the most comprehensive and reliable. However, If you are in a startup-like environment you should pick and choose and not try to follow every single principle listed as they could sink your velocity drastically in the short run.
Other interesting points (IMHO) are Monorepo, Build System, and Code Reviews.
For the Monorepo I discover being a huge lover although I was skeptical. The sad thing is that it's a rather niche practice and tools like Git don't play ball very well (i.e. each time you pull you have to retrieve changes for all the codebase, even files you never saw/heard of managed by another team). I think there's no nice off-the-shelf offering for running monorepos out there. However, not having to fight with git submodules, library versions, ... is great. If the change I am submitting breaks something else in the company you are immediately aware and so can act accordingly (e.g. keep the old implementation alongside the new one and mark it as deprecated so the other team will get a warning next time they do anything).
The build system is a bit more controversial. I learned to love blaze/bazel, but admittedly, the OS version is a bit messy to set up. Additionally, being so rigorous about the build rules felt like a massive chore at the beginning, but now I appreciate it a lot. I can instantly know the contacts of all the teams that use a build rule I declared and hence can be contacted to warn them about bugs, ... . I can create something experimental and have private visibility so only my team can use it and only later expose it to the wider world with just a one liner.
Finally, the code review AKA Critique. Google has the best review tool I had the joy to use hands down. It's clear about what happens, at which stage is the review of a particular section/file and is focused on discussion. The evolution of each change is easy to follow along. These are things I really miss when using GitHub/GitLab PR view. The tooling is incredibly confusing to me. Luckily (I am not affiliated in any way) an ex-Googler (I believe) is working on an alternative that works with GitHub (https://codeapprove.com/).
I am a big fan of Anki, and for reasons I wanted to build it on a machine I have on an uncommon architecture (it has a graphical desktop). I have all of the components... rust, typescript, qtwebengine, etc) installed and working. I invested some time in trying to convince bazel that the required dependencies existed, to no avail. Rules broke left right and centre and every time I found the solution, other things broke. I think it insisted on pulling stuff from the internet, including definitions of other stuff I needed to change. I can't remember much more than that, as I gave up and haven't thought about it much since.
Thing is, pkg-config would've picked up the dependencies just fine - they were literally all there. I even built Rust from source on my weird machine with musl variant, before realising musl has some issues on my architecture.
I suspect Bazel may work well inside of Google for infra/server side stuff (never worked there). I'm a lot more skeptical of more complex builds, like desktop applications across various platforms. Chrome still uses "gn" to generate ninja files, and then ninja to build. For my own stuff, I won't touch it.
> I think there's no nice off-the-shelf offering for running monorepos out there.
I think Git works perfectly well for 99% of monorepos though. It just doesn’t work for the massive ones. I think its a perfect example of something most codebases shouldn’t follow google on.
Personal story from ex-Googler: after getting exasperated at yet one more internal tool launched with great fanfare and almost no testing, let alone documentation, I suggested to the Internal Tools group that we have a contest for BEST internal tool.
Not "worst" since that would be too hurtful. The hope was to recognize excellence, motivate people to be better, and maybe shame the people whose tools received no votes. This suggestion was summarily dismissed.
There were, indeed, some truly excellent tools: Dremel comes to mind. And lots of tools that were nearly unusable.
I left Google around six months ago. I worked in medium and small companies, currently at a startup with ~30 devs.
I would say the vast majority of it works well, some you just don't need until you hit scale (here, scale in the number of developers).
For example, policies work if you have <20 engineers, probably don't really work otherwise.
Blaze/Bazel I miss a lot. Just wrangling the dependencies between shared packages is a mess (though we might just suck at configuring Poetry - at any rate it's not intuitive). Building and deploying is much more involved.
Another thing I miss is code review the Google way. Google asks that you review within 24 hours, reviews by (the equivalent of a) commit and not by PR, and strongly advises to keep commits small; The GitHub PR workflow is terrible in comparison:
1) it nudges you into batching commits into large PRs
2) Is the PR message informative? Is each commit's? What about squash and merge - how many people edit that message? At Google part of the code review is reviewing the commit message. When you squash and merge, that's post approval, so you can't even do that.
3) Hidden conversations? What the actual fudge
4) How many comments have I not addressed yet? For that matter, how many PRs are waiting for my attention and when were they sent?
Of all the Google dev tools, I miss Critique the most. GitHub is terrible at giving enough context to efficiently review a PR on a second or third pass.
I think coupling commits with review progress was a mistake.
Small commit reviews sound miserable. You have no context of the rest of the branch unless you look for it, you have no idea what's in the author's head for future commits (I can imagine some devout YAGNI follower rejecting a commit because a function argument is unused, which the author planned to use tomorrow..), and it sounds like it would encourage minor nitpicking when there's not that much to review. As opposed to a whole branch PR where I can see the entire feature at once and how it comes together.
> comment on which parts of these work well and which don't?
I don't think there is a visible distinction between those parts that work/doesn't work. In fact, most of the cases each practice has pretty strong rationales. The problem is, when you take everything as whole, its cumulative complexity and cognitive overhead tends to go wild and almost no one can understand the whole stack when its original writers/maintainers leave the team.
In fact, this might play a certain role of the Google graveyard narrative; it's not because its engineering culture is bad, but sometime its standard is too high for many cases so it's nearly impossible to keep it up for newcomers, especially when you have external pressures that you cannot ignore. Even if you make an eng team of 3~4 people for a small product, they'll likely suffer from tens of migration/deprecation/mandates over years.
Readability is hit and miss. Very nice to have everything written to the same standard, it makes it much easier to navigate through any project. Downside is it's pretty rough for more peripheral teams or teams working in a language that's a small component of their product. I remember for one of DeepMind's big launches the interface was all in files ending in .notjs, presumably since they didn't have anyone on the team with Javascript readability. This was 5+ years ago, though, so some of the downsides may have been mitigated.
> Cool. Could someone, maybe an ex-googler, comment on which parts of these work well and which don't?
TBH most of this stuff is transferrable and even "common sense" in most of the companies you've worked for. Similarly how Google's SRE book is actually a very good collection of battle won experience on how ops can keep systems more reliable and running.
The book is written in a way that you can easily throw away advice that you don't think useful.
> Additionally, critics of Google may point out that their engineering culture may not be great on its own terms -- every time Google launches a new feature, people post links to the Google product graveyard.
It is personally scary when they develop new products. What if it is a brilliant idea, one I cannot live without? If Google develops it, then I am looking at this stillborn thing, mewling for life when I know its horrible fate.
The trouble here is that Google employees (and perhaps even its upper management) want to believe that they are a company which is an inventor of things. But they are not this at all. They are an advertisement company. Advertisement companies should not and do not want to invent things... inventions are worse than burdens, inventions are these weird alien objects that appear valuable but are quite expensive and do not help to sell ads at all.
So they hawk the inventions like they were freaks in some carnival sideshow to move traffic past their billboards. Until the traffic dwindles (or until they get tired of it). And then they take it out back behind the woodshed and put an end to it.
The inventions are to keep the talent stream coming ... to work on ads.
The inventions are the small tax they pay to pretend to candidates that they could work on inventions when the vast majority of them will be "allocated" to ads.
>A lot of other companies get into trouble trying to cargo-cult what Google does when they are operating in very different environments wherein those practices aren't optimal. E.g. different levels of scale.
For all of this, Google doesn’t create very good products anymore. This is a guide that came into being _after_ Google was successful. It’s not _why_ Google became successful.
Yes, if you have a mountain of money and a horde of underutilized employees, it’s easy to gold plate your engineering and navel-gaze at your biases.
Whenever I watch/read interviews with people who were successful in some way, they usually downplay the ugly hacks and shortcuts they took to get there, and are quick to say that they "should have done it <the right way>". It's really hard to get any insights because of this inherently unreliable narration.
I'm also reminded of something written by Scott Adams (decades ago, when his reputation was different.)
> Most people won't admit how they got their current jobs unless you push them up against a built-in wall unit and punch them in the stomach until they spill their drink and start yelling, "I'LL NEVER INVITE YOU TO ONE OF MY PARTIES AGAIN, YOU DRUNKEN FOOL!"
> I think the reason these annoying people won't tell me how they got their jobs is because they are embarrassed to admit luck was involved. I can't blame them. Typically, the pre-luck part of their careers involved doing something enormously pathetic.
It depends how you look at it. A lot of products created by Google were good/high quality but were killed anyway. It's sad to see things being killed because they were not "big enough".
You can do all of these without Google-level or really any bespoke tooling. A lot of what google builds is for operating at Google scale - distributed builds and running of huge applications - and even then this tooling this simply enables working at this scale, it comes with a lot of cost(speed/complexity) and jank you don't need at smaller software shops.
Unless you have a monopoly on the ad market that's going to mask all of your bad managerial, strategy and culture problems, then I'd advise steering well clear of copying Google. They make money in-spite of their day to day practices, not because of them.
From someone who’s seen how the sausage gets made, I both agree and disagree.
A lot of Google’s practices really are good software engineering practices - provided you have the money to invest in replicating it to a high degree of quality, which could be substantial and better spent elsewhere. When you have one of the most lucrative business models of all time you definitely have the money to invest in trying to make it as stable to maintain and easy to add value onto as possible, so it was definitely worth it for Google in many cases, but each other company will have to determine the costs vs benefits themselves.
Replicating Blaze and Forge seems really expensive and hard to get right (though it can be tremendously valuable for development on a large codebase). Postmortems, containerization, servers-as-cattle, gradual non-global releases… those aren’t as expensive to set up and have great cost/benefit ratios. It’d be stupid to not do these just because Google does them (and in some cases invented or popularized the practice).
As a counterpoint to the engineering/product/culture comments providing context on the book, I would point out that Urs Hölzle has recently stepped back from uber-manager to individual contributor in the infrastructure space.
This is the guy who built the hotspot JIT that added decades to the life of Java, and who engineered Google's data centers and GCP. He obviously doesn't need more money or glory or experience, so ... why?
There are a million examples of things gone wrong, but it may be worth studying one example of how someone could have such an impact and still just love what he's doing.
Are we thinking about the same Urs? His reputation was not particularly good when I was in the TI group at Google recently. He also had nothing to do with this book.
Probably a bit off-topic, but since I'm a bit triggered by the 'abseil' in the domain name:
I wish Google would relax their 'guidelines' when it comes to software that's also published outside of Google. Case in point: the Dawn C++ library (Google's native WebGPU implementation) has a dependency on abseil, and from what I've seen when glancing over the code, the only reason seems to be some minor string-related stuff.
I can only assume that there must be some interal NIH rule inside Google to use abseil in place of the C++ stdlib (of course I would prefer if Dawn would use neither abseil nor the stdlib, especially since it looks like the only component that's used are related to strings, which definitely isn't the focus of a 3D API).
...and then there's of course the use of 'Google Depot Tools' and of course they are using their own build system [1] (however at least there the Dawn team rebelled and also provides cmake build files).
All those Google specifics make it increadibly difficult to integrate Google C++ projects into any non-Google project, and because of this "Google C++ bubble" I would seriously hesitate taking any advice from them about software engineering as gospel, at least when it comes to C++.
> I can only assume that there must be some interal NIH rule inside Google to use abseil in place of the C++ stdlib
FWIW my understanding is that this is exactly backwards: Abseil exists because the internal code and toolchain evolved to use features that hadn't landed in the standard yet, and releasing the support this way allows that dependent code to be used in open source releases. It's not about features Googler's "can't" use, it's that they[1] could always use better stuff, and this is a way to get the better stuff released so non-Googlers could use it at all (and then, apparently, complain about it).
Obviously looking at this in hindsight from the outside of a project using gcc13/clang15, it seems like it's needlessly different. But when written it was forward-looking.
[1] "We", I guess, though I work in ChromeOS and not in this world.
It would be tremendously beneficial if their software dropped the abseil dependency, especially where it is almost entirely unused. Hell it'd be better if they simply vendored the bits they need.
Having to use Bazel, and having to manage an additional dependency like abseil can be hellish for small projects with uncomplicated build systems.
The worst part is that abseil leaks through interfaces and you end up being coupled to it as a consumer of a library. It's bananas. I don't need yet another stdlib.
Just glancing at Dawn, which I never heard of until now, it appears their use of absl is similar in purpose to the way other Google open source projects use it: faced with the choice between requiring C++20 (or 17, or 14) or requiring only C++11 and using absl as a kind of polyfill, they chose the latter.
Many of the authors of abseil are on the C++ committee and contribute to its progress--this is especially true of the string library, where abseil convinced the standards committee to adopt string_view.
The dependency here almost certainly predates C++ adopting the features it has had for a decade.
For string related stuff — there’s not much in the stdlib that can replace the stuff in abseil. (Haven’t looked at Dawn in particular, though.)
E.g. absl::StrFormat — no equivalent until std::format was standardized in C++20.
absl::StrCat: You could use streams, but, ew. Also, StrCat is optimized to reserve sufficient size in advance, so it is more efficient than appending to a string or using a std::stringstream.
C++ stdlib didn’t have string_view for ages. Also, until recently, C++ sucked for things like convert to/from strings and string buffers. std::stringstream is awful.
The string stuff in abseil is mostly a historical byproduct of what google was doing in 1998: manipulating lots of strings. at the time, the C++ standard library string implementation (mainly the GNU one) was immature and slow. The string library was written in the early days for performance reasons, as well as reliability (at the time, the libstdc++ was so bad that most string operations just made garbage, not strings). And then it got too expensive to change the entire codebase.
I remember Sanjay Ghemawat or Jeff Dean mentioning that one of their big "optimizations" was to inline short strings into the string object- instead of a string that was "size_t len, size_t capacity, char *data", anything less than 24 bytes was just stored directly instead of with a pointer. when you're running mapreduces with trillions of small keys, this makes a big difference!
Part of the pressure behind abseil is that perf and promotions are correlated with open source (Tensorflow, Chromium, TFX, etc) and it would be essentially impossible to translate internal projects for public release without a public library like abseil.
In contrast Facebook Folly has much less overall clout because engineers there have more incentive to build-from-scratch, which can include simply not using C++.
A lot of other companies get into trouble trying to cargo-cult what Google does when they are operating in very different environments wherein those practices aren't optimal. E.g. different levels of scale.
Additionally, critics of Google may point out that their engineering culture may not be great on its own terms -- every time Google launches a new feature, people post links to the Google product graveyard.
Stage 1: "We're not Google, we don't need [[whatever]]";
Stage 2: Foreseeable disaster for which [[whatever]] was intended to address happens;
Stage 3: Giant circus of P0/SEV0 action items while everyone assiduously ignores the [[whatever]];
Stage 4: Quiet accretion, over several years, of the [[whatever]] by people who understand it.
And the [[whatever]] ranges from things that are obviously beneficial like pre-commit code review to other clear winners like multi-tenant machines, unit testing, user data encryption, etc etc. It is an extremely strange industry that fails to study and adopt the ways of their extremely successful competitors.
1. Money. Google essentially has a giant, gargantuan, enormous, bottomless pit of money to build a lot of this tooling (and also to take the risk if something ends up not working out). I think you might be able to say that other companies are just being short sighted if they don't implement some of these things up front, and that may be true, but (a) that's pretty much human nature, and (b) given that very few other companies have a bottomless pit of money like Google, that may just end up being the right decision (i.e. survive now and deal with the pain later).
2. Talent. This is closely related to #1, but few other companies have the engineering talent that Google does. If there is one thing I've seen with my experience with ex-Googlers is that most of them are fast coders. So when you go to your boss and say "I'd like to implement engineering/tech-debt improvement XYZ", at other companies it's a harder decision if (on average) it would take 9 months to implement vs. 2 or 3.
3. Related to both of the above, but your 4th bullet point, "Quiet accretion, over several years, of the [[whatever]] by people who understand it.", is actually other companies just waiting for more evidence to see what "shakes out" as the industry-standard, optimal way to do things.
4. Finally, your stage 1, "We're not Google, we don't need [[whatever]]" is actually true in tons of cases. Many of Google's processes are there to handle enormous scale, both in terms of their application/data capacity, as well as the sheer number of engineers they need to coordinate. Very, very, very few companies will ever hit Google's scale.
Now, some of the things you say like unit-testing and user data encryption are ones that I've never seen associated with the "We're not google" mindset, so maybe people have started using that phrase for anythingnow
As a company grows and matures, their software development processes evolve to meet the business needs.
Unless you're referring to automated precommit hooks, this sounds baffling. What's wrong with reviewing pull requests? What if I want to push a WIP while I switch to another branch, I still need a review? Is the final PR reviewed again at the end?
I'll give you "pushed the WEB industry to have transport-layer encryption for the entire industry by default".
I'll even give you "code reviews".
But not the first 3.
Google has (or had ~10 years ago), a thing called "readability" for each language, where in order to be allowed to commit code to the central "google3" repo, you needed to have written some large amount of code in that language, and needed to have a readability reviewer sign off on your code. The process is designed for slowly on-boarding junior people into a team, and introducing them to google coding style and practices. Eg, the senior, mentoring folks on the team do the reviews and bring the new person up to speed. I imagine it must work well in that context.
However, this breaks down when your entire team is new. How do you find somebody to review the code? All several million lines of the product that was acquired? Especially when it is written in multiple languages.
So we were basically locked out of the main corporate repo, unable to do anything productive. We finally figured out that there was a paved path with a git repo used by the kernel team (and android?) that had none of these hurdles, where we could put our code and get productive immediately.
(Left Google a year ago)
My previous team - pretty much any python submission was hitting me with a python "readability" requirement, and it was a bit painful, because only a single person in my entire group of teams (roughly 15 people total) had the "python readability expert" status. My current team - already submitted quite a few significant C++/TS/Java pieces of code to G3, and not a single "readability" requirement triggered.
In the open source world I see a fair amount of "tests are always red, don't worry" and "we can never edit this interface because who knows who it breaks." These problems aren't intractable at Google.
This approach does have its own set of challenges and I do suspect that the monorepo has contributed in some ways to Google's inability or refusal to maintain some older products. But holy cow the ability to do something like move everybody in the company to different vocabulary types is powerful.
Google's emphasis has always been to make things easy for library developers, at the expense of library clients. For people who value backwards compatibility over long timespans, Google's practices could be better.
And of course volunteers working on open source projects have lower standards. Lets instead compare Google to companies which say “we arent google.”
There are plenty of policies floating around that don’t scale well, and plenty of migrations that are still forced on internal users rather than handled magically by the migrating team. The reality is that Google is such a big company most of these fly under the radar of whichever person actually enforces these policies, and it becomes a whole thing to escalate up to whoever enforced them, and then there’s potentially a political battle between whatever director or VP is in charge of the bad actors and the enforcer (ideally they get away with not allocating HC to the internal migration and amortize it across all their users, so that HC can work on flashier stuff).
I think one reason Google has a proliferation of bureaucracy and red tape is that they do not “review” postmortem action items very formally. They are only reviewed as part of the larger incident postmortem review process and the tooling is way overengineered such that performing that review beyond a perfunctory once over isn’t easy to do. So you end up in a situation where “we need to do something” and whichever person handled the incident has to suggest a way to make sure it doesn’t happen again - the easiest of which is to introduce some CYA process. The other reason is that non-coding EMs introduce processes to show some kind of impact on their team.
Also, the existence of the monorepo, global test runs, forced migrations, etc makes it so maintaining a mothballed project incurs some inherent engineering costs - IMO it’s a non-negligible reason Google kills products that could instead simply exist without changes. It also makes it so Google doesn’t really “version” software generally speaking.
DISCLAIMER 2: I think from a hands-on-keyboard SWE there is a lot of useful stuff. What you mentioned about Google culture of killing products and such I am not gonna talk about.
I recommend chapters about testing first and foremost. Among all the codebases I saw (both OS and proprietary) Google tests are the most comprehensive and reliable. However, If you are in a startup-like environment you should pick and choose and not try to follow every single principle listed as they could sink your velocity drastically in the short run.
Other interesting points (IMHO) are Monorepo, Build System, and Code Reviews.
For the Monorepo I discover being a huge lover although I was skeptical. The sad thing is that it's a rather niche practice and tools like Git don't play ball very well (i.e. each time you pull you have to retrieve changes for all the codebase, even files you never saw/heard of managed by another team). I think there's no nice off-the-shelf offering for running monorepos out there. However, not having to fight with git submodules, library versions, ... is great. If the change I am submitting breaks something else in the company you are immediately aware and so can act accordingly (e.g. keep the old implementation alongside the new one and mark it as deprecated so the other team will get a warning next time they do anything).
The build system is a bit more controversial. I learned to love blaze/bazel, but admittedly, the OS version is a bit messy to set up. Additionally, being so rigorous about the build rules felt like a massive chore at the beginning, but now I appreciate it a lot. I can instantly know the contacts of all the teams that use a build rule I declared and hence can be contacted to warn them about bugs, ... . I can create something experimental and have private visibility so only my team can use it and only later expose it to the wider world with just a one liner.
Finally, the code review AKA Critique. Google has the best review tool I had the joy to use hands down. It's clear about what happens, at which stage is the review of a particular section/file and is focused on discussion. The evolution of each change is easy to follow along. These are things I really miss when using GitHub/GitLab PR view. The tooling is incredibly confusing to me. Luckily (I am not affiliated in any way) an ex-Googler (I believe) is working on an alternative that works with GitHub (https://codeapprove.com/).
Thing is, pkg-config would've picked up the dependencies just fine - they were literally all there. I even built Rust from source on my weird machine with musl variant, before realising musl has some issues on my architecture.
I suspect Bazel may work well inside of Google for infra/server side stuff (never worked there). I'm a lot more skeptical of more complex builds, like desktop applications across various platforms. Chrome still uses "gn" to generate ninja files, and then ninja to build. For my own stuff, I won't touch it.
I probably wouldn't have commented, except that, to my surprise it seems the Anki developers have also decided life is too short https://github.com/ankitects/anki/commit/5e0a761b875fff4c9e4...
I think Git works perfectly well for 99% of monorepos though. It just doesn’t work for the massive ones. I think its a perfect example of something most codebases shouldn’t follow google on.
Deleted Comment
Not "worst" since that would be too hurtful. The hope was to recognize excellence, motivate people to be better, and maybe shame the people whose tools received no votes. This suggestion was summarily dismissed.
There were, indeed, some truly excellent tools: Dremel comes to mind. And lots of tools that were nearly unusable.
I would say the vast majority of it works well, some you just don't need until you hit scale (here, scale in the number of developers).
For example, policies work if you have <20 engineers, probably don't really work otherwise.
Blaze/Bazel I miss a lot. Just wrangling the dependencies between shared packages is a mess (though we might just suck at configuring Poetry - at any rate it's not intuitive). Building and deploying is much more involved.
Another thing I miss is code review the Google way. Google asks that you review within 24 hours, reviews by (the equivalent of a) commit and not by PR, and strongly advises to keep commits small; The GitHub PR workflow is terrible in comparison:
1) it nudges you into batching commits into large PRs
2) Is the PR message informative? Is each commit's? What about squash and merge - how many people edit that message? At Google part of the code review is reviewing the commit message. When you squash and merge, that's post approval, so you can't even do that.
3) Hidden conversations? What the actual fudge
4) How many comments have I not addressed yet? For that matter, how many PRs are waiting for my attention and when were they sent?
I think coupling commits with review progress was a mistake.
I don't think there is a visible distinction between those parts that work/doesn't work. In fact, most of the cases each practice has pretty strong rationales. The problem is, when you take everything as whole, its cumulative complexity and cognitive overhead tends to go wild and almost no one can understand the whole stack when its original writers/maintainers leave the team.
In fact, this might play a certain role of the Google graveyard narrative; it's not because its engineering culture is bad, but sometime its standard is too high for many cases so it's nearly impossible to keep it up for newcomers, especially when you have external pressures that you cannot ignore. Even if you make an eng team of 3~4 people for a small product, they'll likely suffer from tens of migration/deprecation/mandates over years.
TBH most of this stuff is transferrable and even "common sense" in most of the companies you've worked for. Similarly how Google's SRE book is actually a very good collection of battle won experience on how ops can keep systems more reliable and running.
The book is written in a way that you can easily throw away advice that you don't think useful.
It is personally scary when they develop new products. What if it is a brilliant idea, one I cannot live without? If Google develops it, then I am looking at this stillborn thing, mewling for life when I know its horrible fate.
The trouble here is that Google employees (and perhaps even its upper management) want to believe that they are a company which is an inventor of things. But they are not this at all. They are an advertisement company. Advertisement companies should not and do not want to invent things... inventions are worse than burdens, inventions are these weird alien objects that appear valuable but are quite expensive and do not help to sell ads at all.
So they hawk the inventions like they were freaks in some carnival sideshow to move traffic past their billboards. Until the traffic dwindles (or until they get tired of it). And then they take it out back behind the woodshed and put an end to it.
The inventions are the small tax they pay to pretend to candidates that they could work on inventions when the vast majority of them will be "allocated" to ads.
Any prominent examples?
Yes, if you have a mountain of money and a horde of underutilized employees, it’s easy to gold plate your engineering and navel-gaze at your biases.
I'm also reminded of something written by Scott Adams (decades ago, when his reputation was different.)
> Most people won't admit how they got their current jobs unless you push them up against a built-in wall unit and punch them in the stomach until they spill their drink and start yelling, "I'LL NEVER INVITE YOU TO ONE OF MY PARTIES AGAIN, YOU DRUNKEN FOOL!"
> I think the reason these annoying people won't tell me how they got their jobs is because they are embarrassed to admit luck was involved. I can't blame them. Typically, the pre-luck part of their careers involved doing something enormously pathetic.
-- "The Dilbert Future" (1997), by Scott Adams
* reproducible fast dev environments - anyone should be able to build anything pretty easily fairly quickly
* culture of design reviews, testing, and code reviews.
* CI/CD, static analysis, PaaS, dependency management.
You can do all of these without Google-level or really any bespoke tooling. A lot of what google builds is for operating at Google scale - distributed builds and running of huge applications - and even then this tooling this simply enables working at this scale, it comes with a lot of cost(speed/complexity) and jank you don't need at smaller software shops.
https://news.ycombinator.com/item?id=31224545 (304 points, 179 comments)
https://news.ycombinator.com/item?id=22609807 (222 points, 70 comments)
Software Engineering at Google (2020) [pdf] - https://news.ycombinator.com/item?id=31224545 - May 2022 (178 comments)
Software Engineering at Google - https://news.ycombinator.com/item?id=22609807 - March 2020 (69 comments)
Similar sounding but different:
Software Engineering at Google (2017) - https://news.ycombinator.com/item?id=18818412 - Jan 2019 (309 comments)
Software Engineering at Google - https://news.ycombinator.com/item?id=13619378 - Feb 2017 (156 comments)
A lot of Google’s practices really are good software engineering practices - provided you have the money to invest in replicating it to a high degree of quality, which could be substantial and better spent elsewhere. When you have one of the most lucrative business models of all time you definitely have the money to invest in trying to make it as stable to maintain and easy to add value onto as possible, so it was definitely worth it for Google in many cases, but each other company will have to determine the costs vs benefits themselves.
Replicating Blaze and Forge seems really expensive and hard to get right (though it can be tremendously valuable for development on a large codebase). Postmortems, containerization, servers-as-cattle, gradual non-global releases… those aren’t as expensive to set up and have great cost/benefit ratios. It’d be stupid to not do these just because Google does them (and in some cases invented or popularized the practice).
This is the guy who built the hotspot JIT that added decades to the life of Java, and who engineered Google's data centers and GCP. He obviously doesn't need more money or glory or experience, so ... why?
There are a million examples of things gone wrong, but it may be worth studying one example of how someone could have such an impact and still just love what he's doing.
He’s not an author and not in the acknowledgments: https://abseil.io/resources/swe-book/html/pr01.html#_acknowl...
I wish Google would relax their 'guidelines' when it comes to software that's also published outside of Google. Case in point: the Dawn C++ library (Google's native WebGPU implementation) has a dependency on abseil, and from what I've seen when glancing over the code, the only reason seems to be some minor string-related stuff.
I can only assume that there must be some interal NIH rule inside Google to use abseil in place of the C++ stdlib (of course I would prefer if Dawn would use neither abseil nor the stdlib, especially since it looks like the only component that's used are related to strings, which definitely isn't the focus of a 3D API).
...and then there's of course the use of 'Google Depot Tools' and of course they are using their own build system [1] (however at least there the Dawn team rebelled and also provides cmake build files).
All those Google specifics make it increadibly difficult to integrate Google C++ projects into any non-Google project, and because of this "Google C++ bubble" I would seriously hesitate taking any advice from them about software engineering as gospel, at least when it comes to C++.
[1] https://chromium.googlesource.com/chromium/src/tools/gn/+/48...
FWIW my understanding is that this is exactly backwards: Abseil exists because the internal code and toolchain evolved to use features that hadn't landed in the standard yet, and releasing the support this way allows that dependent code to be used in open source releases. It's not about features Googler's "can't" use, it's that they[1] could always use better stuff, and this is a way to get the better stuff released so non-Googlers could use it at all (and then, apparently, complain about it).
Obviously looking at this in hindsight from the outside of a project using gcc13/clang15, it seems like it's needlessly different. But when written it was forward-looking.
[1] "We", I guess, though I work in ChromeOS and not in this world.
It would be tremendously beneficial if their software dropped the abseil dependency, especially where it is almost entirely unused. Hell it'd be better if they simply vendored the bits they need.
Having to use Bazel, and having to manage an additional dependency like abseil can be hellish for small projects with uncomplicated build systems.
The worst part is that abseil leaks through interfaces and you end up being coupled to it as a consumer of a library. It's bananas. I don't need yet another stdlib.
The dependency here almost certainly predates C++ adopting the features it has had for a decade.
E.g. absl::StrFormat — no equivalent until std::format was standardized in C++20.
absl::StrCat: You could use streams, but, ew. Also, StrCat is optimized to reserve sufficient size in advance, so it is more efficient than appending to a string or using a std::stringstream.
absl::Cord. No equivalent in the stdlib.
I remember Sanjay Ghemawat or Jeff Dean mentioning that one of their big "optimizations" was to inline short strings into the string object- instead of a string that was "size_t len, size_t capacity, char *data", anything less than 24 bytes was just stored directly instead of with a pointer. when you're running mapreduces with trillions of small keys, this makes a big difference!
In contrast Facebook Folly has much less overall clout because engineers there have more incentive to build-from-scratch, which can include simply not using C++.
Deleted Comment