A Eulogy for DevOps - Readit News

wokwokwok · a year ago

There’s so much truth in this.

It really cuts to the heart of it when you looking at the “devops cycle” diagram with “build, test, deploy” …and yeah, those other ones…

I remember being in a meeting where our engineering lead was explaining our “devops transformation strategy”.

From memory that diagram turned up in the slides with a circle on “deploy”; the operational goal was “deploy multiple times a day”.

It was about speed at any cost, not about engineering excellence.

Fired the ops team. Restructured QA. You build it you run it”. Every team has an on call roster now. Sec dev ml ops; you’re an expert at everything right?

The funny thing is you can take a mostly working stable system and make fast thoughtless chaotic changes to it for short term gains; so it superficially looks like it’s effective for a while.

…but, surrrrpriiiisssseeee a few months later and suddenly you can’t make any changes without breaking things, no one knows what’s going on.

I’m left with such mixed feelings; at the end of the day the tooling we got out of devops was really valuable.

…but it was certainly a frustrating and expensive way to get there.

We have new developers now who don’t know what devops is, but they know what containers are and expect to be able to deploy to production any time.

I guess that’s a good way for devops to quietly wind up and go away.

Sn0wCoder · a year ago

DevOps and releasing multiple times per day does not always mean to PROD and in most industries that is impossible. Continuous Integration and Continuous Deployment should and do mean to the development branch and container. Locally I can write code and skip all the tests, skip the linter, skip prettier, etc.….

When I do a commit prettier is run. When I PR into develop other dev’s review the code. When its approved and completed a build is run (including all tests and sent to Veracode and SonarQube for analysis). The build is deployed to the development container where the developers can smoke test and then when SQA is ready promote to Integration for real testing. Fail fast! If the tests fail, we can fix them. If code quality goes down, we can fail the gate === no deployment. Without DevOps and CI/CD these steps get skipped.

Not sure where the idea that software is released to production multiple times per ever came from. Yes, the develop branch should in theory be ready to be promoted at any time but we know that is not reality.

lolinder · a year ago

The big advantage to deploying to prod constantly is trunk-based development.

Any system where you maintain a separate development branch is one where you're invariably going to be asked to cherry pick feature B to production but not feature A. This is a problem because you tested everything on a version of the code where B follows A, but now B needs to stand on its own. Can it? Maybe. But you didn't test that.

With trunk-based development you have one main branch that everything goes to all the time. If you need something to not go out to real users then you put it behind a feature flag. If something needs to go out to users today, then you merge a change on top of your main branch that unconditionally shows the feature to users.

This way all code that ever makes it into production exists at a point along a single linear commit history, each merge commit of which was checked by CI and is the same history that devs developed against locally.

In my experience eliminating cherry picking makes a huge difference in reliability, and the only way to do that reliably is to constantly deploy to prod.

And yes, my current $day_job does this.

signal11 · a year ago

It’s possible in many even heavily regulated domains, because there’s enough data now for regulators that it reduces risk.

Releasing multiple times a day in itself isn’t the point though.

The point is — adopting good practices to ensure trunk based development is pain-free. It’s about ensuring the release process is as automated as possible (if you’re releasing once a month, you are exercising your deployment process 12 times a year. When you’re exercising your deployment process 12 times a day, it really encourages you to work the kinks out.)

It’s also ensuring that your monitoring and alerting can keep up. Essentially it’s a signal to the entire org that “We will move fast. Deal with it.”

There’s data out there from even large non-FAANG orgs now about how this approach reduces sevs instead of increasing them.

LudwigNagasena · a year ago

> Not sure where the idea that software is released to production multiple times per ever came from.

UX-focused development. Your concern is mainly about user journeys, acquisition tunnels, etc. So you rely on user feedback a lot. So you ship, A/B test, iterate.

Also, if your service consists of lots of microservices, you don’t want to wait to ship them all at the same time.

The most fitting example to both is probably Netflix. I guess they ship to prod every day a lot.

smusamashah · a year ago

I bumped into a DevOps job at Teradata, did it for two years, then left for where I belong, in gaming as a backend developer. It was a corporate with good pays and perks including international travel etc. Some friends are still there doing DevOps very happily. Talking to them it never feels a like a dying field.

I hated my DevOps role so pardon me for only reading the headings of the article, but talking about it in past sentences seemed very delusional.

wokwokwok · a year ago

The deep deep irony of hiring someone specifically to do "devops".

That's not devops.

Devops is when you have developers who are empowered.

When you hire someone specifically to do devops you are hiring an ops team and calling it devops.

So, yeah. That still happens, sure. However, I do think it's changing these days:

linkedin > connections -> control-F -> 'devops' => 0 hits

"Site Reliability Engineer" -> 10 hits

I think hiring SRE makes sense. I think hiring 'platform engineers' makes sense. People still do that, and will, as far as I know, far into to the future.

...but I think hiring devops means you never understood, even vaguely, what devops was; you're just using a buzz word in your job ad.

austinshea · a year ago

This is entirely predicated on the issues this person experienced. Irrespective of whether or not devops teams end up with solutions that look like this, none of them are meant to.

My first experiences had to do with the ability to add new services, monolith or not, and have their infrastructure be created/modified/removed in a environment/region in-specific way, and to be able to safely allow developers to self-service deploy as often as they want, with the expectation that there would be metrics available to observe the roll-out, and safely revert without manual intervention.

If you can't do this stuff, then you can't have a serious posture on financial cost, while also providing redundancy, security, or operating independently of one cloud provider, or one specific region/datacenter. Not without a lot of old school, manual, systems administrator work. DevOps hasn't gone away, it has become the standard.

A bunch of pet servers is not going to pass the appropriate audits.

8organicbits · a year ago

Agreed, standardization has been hugely helpful.

From the article:

> adopt much more simple and easy-to-troubleshoot workflows like "a bash script that pulls a new container".

This style of thinking let's developers learn every nitty gritty pain of why we have frameworks. I see the same challenge with static site generators, they are all kinda annoying so people write their own, but then theirs is even worse.

osigurdson · a year ago

>> abandon technology like Kubernetes

I think a lot of Kubernetes hate is misplaced. It is a great piece of software engineering, well supported and runs everywhere. You certainly don't always need it but don't create a bunch of random bash scripts running all of the place instead of learning how to use it.

weitendorf · a year ago

It's basically a prototype that industry ran away with. It leaks implementation details everywhere and pushes way too many config options up to the developer. Because industry ran away with it before it could be good, it takes projects like Cilium to push it in the right direction, but those take forever to get adopted and are really hard because they don't live in the product itself.

You'd need like 10 more Ciliums to make Kubernetes actually good, and they'll never all get adopted. The tech is a dead end that demonstrates some nice tech/patterns but got overhyped. It's like the Hadoop of containers

llama052 · a year ago

Can you clarify what you mean by "leaks implementation details everywhere"?

I like to think of kubernetes as a big orchestration platform that you can choose to use what you need. If an ingress and pods work then use that, otherwise extend an throw an operator up for what you need (it likely already exists).

Cilium for instance is great for that, so is Istio and the like. They aren't hard you just have to understand networking... which is nearly the same energy of running it on another orchestration tool or raw on a network device.

sadops · a year ago

Kubernetes is just a poor Linux clone with extra steps. Seriously, it has all the basic parts of an OS, just half-assed: the scheduler, the networking, the state management. We already had way better operating systems that can, you know, schedule workloads and talk to the network, and it didn't require gigabytes of YAML and string templating to make it happen.

physicles · a year ago

Agreed. If we didn’t use kubernetes, we’d have to reimplement a bunch of its features. Back when we were getting started I tried docker swarm because it was supposed to be simpler, but I had weird issues with its networking.

The best part is multiple environments. I can run our full stack on my laptop with k3s and one call to make. We use the same yaml (with kustomize) for all three cloud environments.

It feels pretty boring, which is fantastic.

weitendorf · a year ago

The argument against Kubernetes is not that you should go backwards from it, but that there is a huge need for something better than it. Many big tech companies have platforms that are better than it which aren’t properly (if at all) externally productized.

And the fact that many medium companies have “platform teams” configuring Kubernetes and gluing together basically the same set of tools (source, build, test, release, ops, obs) together in basically the same way is a huge smell that something better is needed. Basically, doing things the right way is actually a big operational/engineering/monetary burden for most companies that just want to write applications. And K8s is a big part of that

udev4096 · a year ago

Whoever has worked with bash scripts would know that relying on a bunch of bash scripts for your infrastructure is extremely naive. At the end of the day, Kubernetes is just a tool with complexities in place for companies with larger workloads

zer00eyz · a year ago

Its not.

The whole ecosystem around it is an example of Conways law and and a Google product.

None of the people using it are google.

Google, also, runs its own hardware.

Shockingly it is a great product if you rent hardware, autoscaling is autospending. No one knows what a feature costs any more because its all just a big bucket your pouring money into for amazon to have 30 percent margin on.

We need operations people again, we need to stop using containers as bags for shitty "software" ... Do actual engineering.

llama052 · a year ago

Kubernetes is actually extremely popular all around the world. Chickfila if I recall correctly deploy it in every single store!

A lot of big dinosaur corporations are implementing it actively. Unfortunately VMs or Kubernetes or whatever tooling is still going to suck if you have shitty people using them.

whoknowsidont · a year ago

>The whole ecosystem around it is an example of Conways law

This is such an inaccurate take.

>we need to stop using containers as bags

Containers and container orchestration are a NEEDED and REQUIRED piece of the technology stack in the current reality. That doesn't mean people need to be ignorant of the details that make them work.

As someone who's been around for a while we are in a better place than when we had "dedicated operations people" that just gatekept everything.

What you hate are teams just parroting what other people are doing with the tech, not the tech itself. But let me tell you, if you're going to have teams of people pretend to know what they're doing wrapping it in a standard "bag" sure does make it a hell of a lot fucking easier to unfuck when things go wrong.

milkglass · a year ago

Where's all the good operations people at nowadays?

Have worked with numerous cloud native engineers that do not have good foundational knowledge.

28304283409234 · a year ago

Many things are 'great pieces of [software] engineering'. That does not mean they are a fit solution to any problem. Many times a script or simple ansible playbook, or docker-compose is just a better fit.

The_Colonel · a year ago

> The cause of its death was a critical misunderstanding over what was causing software to be hard to write. The belief was by removing barriers to deployment, more software would get deployed and things would be easier and better. Effectively that the issue was that developers and operations teams were being held back by ridiculous process and coordination.

So many arguments are based on strawmen...

I like devops / daily deploys, because they're part of the puzzle leading to higher quality code being deployed on production, and associated less stress.

The point is (for any individual developer) not to actually deploy their progress every day on prod, but to have the option to do so. This leads to code going on prod when it's ready, but no sooner. If the problem is more difficult than anticipated, code still sucks and needs refactoring, well, you're just going to work on it as long as it needs it and deploy it only then.

Meanwhile if you have let's say monthly releases, you will get the death marches, because delay of one day can mean delay of one month / quarter / whatever. Everyone feels the pressure to deliver, leading to suboptimal choices, bad code being approved etc.

solatic · a year ago

The main thing the author gets wrong is that it's now much better understood amongst engineering leadership that development teams need at least one person with ops/infra skills. Development teams shouldn't wait for a centralized DBA team to pick up their schema change request, but neither does it make sense to ask frontend developers to learn all the ins and outs of running databases. Teams do need somebody to specialize in that skillset. This person with ops/infra skills is the modern Site Reliability Engineer (i.e. for most companies, a term that was inspired by Google's book, but distinct from Google's implementation of the concept).

As startups grow into enterprises, eventually there are benefits to be had from getting all the different SREs on the same page and working according to the same standard (e.g. compliance, security, FinOps...). Then, instead of each SRE building on top of the cloud provider directly, each SRE builds on top of the internal platform instead.

ianbutler · a year ago

The successor, the platform team, is also really only accessible to enterprise companies.

Hiring an entire team to build great dev-tooling and deployments, monitoring, application templates, org level dependency management etc is just too much to swallow for any medium sized or smaller business, so in that reality you wind up with a few heavily overworked devops folks who take up unhealthy habits to cope with the associated stress and risk.

In my 10 year career thus far none of the startups I worked for, even well capitalized ones had what this article, and myself, would consider to be a platform team. I only saw my first platform team when I stepped into a role at 6000+ person company.

It's effectively an underserved (and under-appreciated imo) area and responsible for a lot of pain and land-mine decisions companies make around their software product.

osigurdson · a year ago

>> The User is the Tester

If you can afford to make the user the tester, you should. There is no moral hazard, only an economic one. If you have 5 million customers paying $1 / year, make the user do the testing via canary deployments, metrics, etc. If you have 5 customers each paying $1M / year, be sure to test it yourself.

The problem seems to be that people forget which regime they are operating in.

chefandy · a year ago

> There is no moral hazard, only an economic one.

Er... No? If you take someone's money in exchange for goods and services, you have a moral duty to give them what you said you would. Not a broken version of it– what they bought. If you explicitly state they're getting an unstable product, then sure. If you actually do your best within reason and your service is broken, shit happens. Nobody is perfect, but you made a good faith effort to deliver on your promise. But if you don't, and deliberately don't bother checking if it actually works while happily pocketing people's cash, that's unambiguously negligent. Ripping off small groups of customers is no morally different than ripping off all of your customers— it's the same immorality at a smaller scale so you're more likely to get away with it.

osigurdson · a year ago

Is your argument than an amount of money approaching zero in the limit is morally distinct from zero - triggering a step change in behavior?

photonthug · a year ago

2 observations, first the cynical one, but the second is optimistic.

For leadership, the whole idea of "breaking down silos" is almost always lip-service, and to the extent that is/was a core mission of DevOps, it was always doomed. Responsibility without power doesn't work, so it's pointless unless the very top wants to see it happen. Strong CTOs with vision are pretty rare, and the reality is that the next tier of department heads from QA/Engineering/DataScience/Product are very often rivals for budgets and attention.

People that get to this level of management usually love building kingdoms, and see most things as zero-sum, so they are careful to never appear actually uncooperative but they also don't really want collaboration. Collaboration effectively increases accountability and spreads out power. If you're in the business of breaking down silos, almost everyone will be trying undermine you as soon as they think you're threatening them with any kind of oversight, regardless of how badly they know that they need process changes.

Anyway, the best devops people are usually excited to code themselves out of a job. To a large extent.. that's what has happened. We're out of the research phase of looking for approaches that work. For any specific problem in this domain we've mostly got tools that work well and scale well. The tools are documented, mature, and most even permit for a healthy choice amongst alternatives. The landscape of this tooling is generally hospitable, not what you'd call a desert or a jungle, and it's not as much of a moving target to learn the tech involved as it used to be.

Not saying every dev needs to be a Kubernetes admin.. but a dev refusing to learn anything about kubernetes in 2024 is starting to look more like a developer that doesn't know Linux command line basics. Beyond the basics, Platform teams are fine.. they are just the subset of people with previous DevOps titles that can actually write code, further weeding out the old-school DBAs / Sysadmins, bolstered by a few even stronger coders that are good with cloud APIs but don't understand ELBs / VPCs.