Staging is dead: The rise of preview environments

CSMastermind · 3 years ago

Coherence is obviously trying to sell you their product here but I'm not sure preview environments and staging environments are opposed.

They way I've set things up at my current company is this:

When a developer creates a pull request an ephemeral (preview) environment is generated.

When that code is merged it automatically gets deployed first to our staging environment and then immediately to production.

Staging gets a sanitized copy of production's data every night so both the code and the data are close mirrors of production and that's what developers (and those preview environments) integrate with when they're calling services other than their own.

If you were to get rid of staging would you stand up your entire backend every time you made a preview environment? Does that include replicating databases or would you just use seed data that may or may not resemble what's in production?

It seems like the solution they're pitching works great if you have a small team and only a few services/sites.

hangonhn · 3 years ago

Yeah. And it's also not that impressive. We've had what they call "preview environments" for a very long time and at my previous companies too. With GitHub and AWS/Cloud it's trivially easy to do.

And like you, we have both "preview environments" and staging and for the exact reason you've highlight. I'm rather skeptical of the company behind this product now since they don't seem to understand the difference and really overselling something that's quite common.

echelon · 3 years ago

I have spent a lot of time thinking about this problem. I would gladly pay for a product that could make a real, honest to god preview environment that you could spin up on-demand and populate it with all of the data you need.

This isn't it, and I think it's pretty much an impossible problem to solve without an entirely new engineering culture around testing.

How do you duplicate the behavior of the 10 direct data dependencies of your app and the constellation of transitive microservice dependencies? Wingman, Galactus, etc. At a company with 1000+ microservices?

And how do you populate it with actual data so you can log in and perform actions? Every single microservice probably needs test data, and it would take a lot of engineering forethought to populate the entire graph.

It gets gnarly. I've been on teams at the center of it all - user accounts and login, representation of the core business entities that the rest of the company is built upon. We couldn't even solve the problem for ourselves. There were so many permutations of account states, creating a dedicated API to make test data deviates from a first class API. So how do you coordinate with downstream services to skip IDV (matching SSN), compromised password checks, etc.

Perhaps you only bring online a subset of services and fake out the rest. At what level can you fake stuff out? And how do you ensure that the test data works with whatever subset you don't fake? If a single service responds in an unexpected way, it could break or corrupt the state you're relying upon to test.

Maybe "preview" works for frontend. Backend, not so much. This is such a massive problem, and I'm only scratching the surface with my description of the kinds of issues you have to solve.

phendrenad2 · 3 years ago

Yes, this is one of those things that was easier before kubernetes.

Before: Your developers have a shared Linux box that serves the preview environment. Your ruby app has a tiny loader in front of it that loads the app code from a specific directory based on subdomain.

After: You need to route subdomains to different kubernetes containers, and handle deploying new containers when the developer opens a PR (because it's much more work than just copying files onto a server), and you need to handle destroying old containers (because it's much more work than just deleting files on a server).

Not to mention, usually kubernetes is managed by "the ops" who don't know anything about the app. Much more difficult to interface with them when it's not "oh one server, and the developers figure it out, and if they royally fuck it up we'll rescue them".

talove · 3 years ago

The reason every complex application I've worked on has had a staging environment is because you do need to test production deploys in an environment that mirrors production dataset and infrastructure. Especially with data migrations, distributed databases. That is prohibitively expensive and not feasible to run in n+1 envs.

jcraft · 3 years ago

Big if, but if you can use database containers it's relatively low cost to spin them up in a namespace, load data from a database snapshot, run the migration, and then tear down the container or even the entire namespace.

andrewmutz · 3 years ago

> If you were to get rid of staging would you stand up your entire backend every time you made a preview environment? Does that include replicating databases or would you just use seed data that may or may not resemble what's in production?

What I've seen done is that the staging environment is skipped, but instead concentric releases are performed on the customers. So released code goes first to a small group of users, and then widens from there if everything looks good.

klooney · 3 years ago

Schema changes are hard to roll out that way though.

oasisbob · 3 years ago

This technique doesn't seem especially relevant for many kinds of backend work.

scubbo · 3 years ago

> What I've seen done is that the staging environment is skipped, but instead concentric releases are performed on the customers. So released code goes first to a small group of users, and then widens from there if everything looks good.

If you really want to invest in extensive testing, you can do both! Hell, _and_ you can have a load testing environment as well.

colpabar · 3 years ago

How long do your PRs stay open? Could you explain a bit more about what happens between opening the PR and merging it, like automated/manual testing? At my company we do something similar, except we trigger production deployments off tags, not merged PRs. Merging goes to staging only.

CSMastermind · 3 years ago

It really depends on the PR. Most do get merged fairly quickly (same day) but some where there's more discussion can stay open for several days or even a week.

All of our codebases have some level of automated testing that runs when the PR is created. That could be unit tests of a single function or Playwright tests which exercise an UI flow. What those tests are dependent on the type of software and the team building it.

The preview environment links that are generated when the PR is open are given to QA, Design, and Product to take a look at. Not every one of those roles review every PR, it really depends on the task. But if their feedback is required, they'll give it at this stage.

Probably worth mentioning at this point that we have separate repos for each web application and service. It's service oriented but not microservices, services at our company encapsulate a relatively large domain and it's similar with web applications, right now each web application has its own subdomain.

So PR gets merged, the pipeline will deploy that code to staging. At this point the service owners have the option to define a set of integration tests, not every repo has them but it is an option to run them. If those tests fail we do a rollback. If those tests pass (or if they didn't have any defined) then a second deployment to production is triggered.

All of our deploys are blue-green, so no downtime except for the occasional blocking database migration in which we have to schedule downtime. User facing features on the web applications are all required to be released first behind a feature flag. For those there will be several releases behind a feature flag, then we'll flip the flag first in staging. We'll have Product and QA do a complete run through and make sure they're okay with everything, then the "real" release happen when we flip the feature flag in production (no deployment needed).

Every night we clone the production database, sanitize it (swapping out things like names, phone numbers, etc.) then point staging to that clone. Then we run an end to end regression test that goes through all of the happy path flows for all of our apps.

ArjenM · 3 years ago

>a sanitized copy

Quite the importance over a constant wave of changing code on development in certain projects.

Staging seems like a more code locked stable area for all previews that can be polished via quality practices and keen eyes.

Now it seems a little fleeting to me as well to have the shock of previews enter into this instead.

Let the paint dry adequately before taking it outside.

sander1095 · 3 years ago

Setting up preview environments automatically (in kubernetes or azure) with sanitized production data is something that always sounds amazing to me, but I have always had difficulty to come up with ideas to implement it. Every company/team uses different tools and ways to implement this, so there isn't a one-size-fits-all solution.

Do you have any pointers?

candiddevmike · 3 years ago

How do you sanitize data between prd and stg?

jcraft · 3 years ago

My company can anonymize production data while preserving the utility of the data. You get privacy-safe data that looks just like production data with the same distributions. It's real data.

The typical workflow is to anonymize production data, snapshot it, and make it available to preview and testing environments as replicas. It's pretty fast.

champagnepapi · 3 years ago

You can use something like this https://www.replibyte.com/docs/introduction/

ffggffggj · 3 years ago

This is not a great technical article. I don’t find it makes me curious. It’s entirely content marketing, zero technical content, clearly aimed at people who manage engineers but aren’t engineers. Doesn’t discuss drawback (doesn’t really define anything with enough clarity for there to be drawbacks anyway) and ends with a sales pitch.

The obvious problem here is that this approach is far too expensive for any org that isn’t a tiny startup whose production system fits on a half dozen hosts. Consider a column store DB that’s configured in a multi region manner. Is every engineer bringing that up in their preview environment? If they aren’t, it isn’t faithful to a critical production performance constraint. If they are, the company is probably paying twice the cloud costs of their competitors at least, and they don’t get the capacity planning benefits of a fixed staging deployment.

andrewstuart2 · 3 years ago

Why would you configure your column database as multiregion in every environment? You can run 99% of your tests, iteration, validation, etc, against a single shard. And even the same instance if you don't need to make any schema changes. You don't need a faithful production performance match until you want to validate performance characteristics, which often comes long after you've validated other correctness criteria that doesn't need prod-scale infra.

So once you've validated everything, you maybe stand up the prod-scale stack for less than an hour to run your at-scale tests, then tear it down again. Definitely not doubling your cost because you're not keeping anything around, and you're only using it when needed. Most places in my experience actually don't do this, and instead have a copy stood up somewhere just wasting money as you've described.

ffggffggj · 3 years ago

One reason is that transaction performance could be very different, even regardless to data volume.

Your points are all correct of course, but then that wouldn’t be a preview environment, would it? It would be shared database state in a database staging environment.

Which, for what it’s worth, is also how I have seen this problem managed when it came up.

As for costs, maybe true, but it could also take a lot longer than an hour to bring up a large deployment. And if every engineer is doing this for one hour a week, and you have a few dozen engineers, there’s a 2x cost increase with a tougher capacity planning problem since your load is now tied to your hiring plans.

kypro · 3 years ago

Maybe I'm missing something here, but I haven't worked anywhere for years that's had a "staging" environment, as least not as this article describes it.

Everywhere I've worked either has spin up environments, a developer issued test environment or various lower testing environments where we can deploy individual features for testing and QA.

But more to the point everywhere I've worked has also combined these lower level testing environments with some kind of pre-production environment which features more data (typically a clone of prod). On this environment often a further layer of tests are ran and new features are manually reviewed one last time before being deployed to production.

Where does a "staging" environment fit in this? Unless I'm miss understanding the article seems to suggest code goes, dev -> staging -> prod, but aside from one developer role I had in the early 00s it's always been, dev -> test env -> pre-prod -> prod.

Is the article suggesting you can just do away with pre-prod environments for "preview" environments? And why would you even want to do that? As I understand it one of the features of a pre-prod environment is that is persists and isn't a clean slate each time. Any mess that can accumulate in prod can accumulate in a pre-prod environment.

I guess I'm not following what this article is advocating for.

ezekg · 3 years ago

> But more to the point everywhere I've worked has also combined these lower level testing environments with some kind of pre-production environment which features more data (typically a clone of prod). On this environment often a further layer of tests are ran and new features are manually reviewed one last time before being deployed to production.

Sounds like staging to me...

maerF0x0 · 3 years ago

fwiw it's generally not compliant to copy prod data into an alternative env without doing major scrubbing. Else your preprod env now has to have SOC compliance things like auditing of access to DBs, GDPR deletion etc.

phphphphp · 3 years ago

"Staging" means "an environment where multiple unreleased pieces of work are tested together before being released to production". Pre-prod and staging are synonymous in most organisations.

A preview environment is an ephemeral environment for testing a single piece of unreleased work. Framing it according to the terms you've used, the article is advocating for replacing "dev -> test env -> pre-prod -> production" with "dev -> test env -> production".

kypro · 3 years ago

Thanks for clarifying.

In that case what they're advocating for seems like a really bad idea. One nice thing about having staging / pre-prod environment is that it can replicate prod architecture, content and config (for the most part anyway) which allows for things like performance checks to be ran after code has been tested for functionality in lower environments. Something I've also seen is that prod websites typically load various analytic and marketing scripts which might not be included on dev environments. Even if those scripts are not maintained by the dev team, you probably still want to be checking any code being deployed isn't going to break an analytic script because it's expecting links to have a certain class, etc.

pmontra · 3 years ago

The article is about spinning up multiple staging servers, maybe one per branch/feature and possibly leaving them there for hours or days, to be inspected at each own URL until somebody confirms that the implementation is OK. All of this is automated (a git push?)

The usual staging and/or preproduction environments would be the preview of their own branches.

It sounds nice. The only problems I can see are:

1) The cost, because each one of those instances could have its own queues, databases, connections with third party APIs, etc.

2) Seeding the database with a meaningful amount of data. I often see preproduction database built by testers month by month after a minimal seeding. Each instance here has to be seeded with all data plus what's required to demo the new feature. That must be saved before destroying the database, reused for further branches, merged with the data for other features.

jcraft · 3 years ago

There are ways to anonymize production data and make it available in preview and testing environments. My company has been doing that for customers and it has been effective. You get privacy-safe data that is still very close to production data.

Depending on the application architecture, you can spin up database containers and seed them with replicas of the anonymized data.

ryanSrich · 3 years ago

> dev -> staging -> prod

> dev -> test env -> pre-prod -> prod

You're just using a different word and adding one additional environment. You swapped out the word staging and replaced it with pre-prod.

nunez · 3 years ago

staging goes by a lot of different names, most commonly UAT, pre-prod or Acceptance

it's the place where more biz-facing people can tell you the color's off by like half a shade just before a big release :)

pbalau · 3 years ago

pre-prod is staging.

pmontra · 3 years ago

Not always. Not every company likes continuous deployment. Sometimes staging is the server that demonstrates the feature of the day or of the week. Preprod is where all the software goes after getting a green light in staging, waiting to go in production maybe a few times per month or even per year.

btown · 3 years ago

In general, it's super useful to be able to have non-technical members of the team play with a specific feature in development, with a simulacrum of live data, independent of other ongoing projects. There may be significant problems that need to be addressed, including data loss problems that make it so you can't just uncheck the feature flag in prod, and blocking staging (or worse, prod) until things can be rolled back creates a significant bottleneck.

We've been experimenting with a pretty reliable way to do this in-house, though. The key is that a single Kubernetes namespace per preview app iteration contains everything your preview app needs. We have a Github action, triggered by a "preview" label on a PR and any subsequent commits, that:

- builds and pushes a docker image with a dedicated tag

- within the specific preview namespace named after the PR ID, spins up and seeds with test data (in our case, a sanitized subset of production) a dedicated database statefulset

- runs the Helm chart that we use for production, but against this specific preview namespace, with this specific image tag, and overriding our normal ingress domain with one specific to this preview app

Then when the label is removed, we drop the namespace and reclaim all resources. "Rebasing" off later data, or if reviving a stale PR, is as simple as removing and re-adding the label.

jcraft · 3 years ago

We've seen customers do something very similar and we've helped them get privacy-safe data into each preview. Handwaving a lot here, but the steps are 1) anonymize production data to a destination, 2) snapshot the anonymized data, and 3) replicate the anonymized data to all of the previews.

dawnerd · 3 years ago

The rise of preview environments? They've been popular for years. We were using them casually 10 years ago and didn't feel like it was on the bleeding edge. Kind of a weird marketing post.

codalan · 3 years ago

I think their target audience isn't the typical HN reader, but the rest of the industry, where multiple staging environments are still a common thing (including my current employer).

The biggest issue we had with preview/review environments was resources not being cleaned up properly after destroying the environment (along with the associated costs), but that was more an indictment of our IaC code than anything else. It also didn't eliminate staged environments for things like the non-standard DB that we were using. As long as the changes on the DB side were net-new, it wasn't really an issue, though.

jeremy_k · 3 years ago

You're 100% correct that the target audience of this isn't HN readers. Having worked in this space for a number of years and having written this same fluff article in 2020, this type of content is marketed directly to engineering managers and VP types who are looking for solutions to issues around their SDLC or developer productivity.

The technical implementation part can be interesting to readers and potential engineers who will use the platform and is worthwhile to have available, but this content is purely meant to cast a wide net.

toolslive · 3 years ago

Zope had this in 2000.

kyrra · 3 years ago

Agreed here. I haven't actually used staging env as described here (single instance where you can test. We have used what is called preview instances for quite a while. We tend to keep 2 hot, and they get recycled ever X hours and new binaries are installed. These allow a single person to check it out and use it (checking one out causes a new instance to be spun up).

We still have a sandbox instance, and that is hooked up to the rest of the stack.

It works pretty well if you need to iterate on something quickly that you are unable to do with a test.

nunez · 3 years ago

The article correctly outlines the benefits of ephemeral envs but glosses over the elbow grease needed to get there.

Going from static staging envs to ephemeral envs is a lot of work for non-startups with traditional release processes.

Data is an example of a challenging hurdle. It is (somewhat) straightforward to Terraform production and modularize it to make it repeatable. But what do you do when your most current tables have customer PII or other sensitive data in them and migrations are done manually during release? Now you need to audit the entire database for fields where PII might exist so that automation can be written to dump those databases and sanitize that data.

Then there's scavenging the envs. Ephemeral envs can cost A LOT of money. Many places also do not apply as rigorous reporting on resourcing in those envs as they do in production. So what happens when you get a new CEO who wants to cut infra cost by an insanely high figure and you start with the preview envs, but you have no idea which are for active PRs, which were created out of band, which are prod-in-all-but-name, etc?

It's definitely worth the work, as I believe that ephemeral envs lead to safer, faster releases, but getting leadership to buy in and invest capital is often a requirement.

IMO the REAL challenge is creating quality, ephemeral LOCAL environments. So many apps need a crap ton of infra to stand up for no other reason than "we use a crap ton of infra to build and test our app". Like, if I NEED to create a VPC to run an app as a Lambda function because the author tests by shipping directly to Lambda from their IDE or whatever, that's a waste of money and productivity. No reason why we can't run that in Docker locally and mock external dependencies as needed.

maerF0x0 · 3 years ago

In my experience reliance on these kinds of "test" environments stems from a massively over estimated ROI on manual testing. (Remember, to compete you can't just have positive ROI, you must have optimal ROI or you'll be out capitalized)

There are plenty of automated and technology driven ways to assert the quality of a system, which are far more reliable, faster, widespread (across more of the system) than preview envs. A lot of testing involves creating chains of logic to make resultant assertions. If A works, and B works when A works, then B works.

And things like compile time checkers (eg types), linters, and TLA+ so I've heard, are examples of things that are better than humans, rapid once setup, and in my experience give massive ROI.

I had a manager the other day say every PR needed a manual "smoke" test before release. I challenged him why would I spend 15-60 minutes manual testing one PR when I could spend that time writing tests that check _every PR_ ? He essentially told me to just fall in line and do as I was told. (No rational rebuttal)