You don’t need to be “enterprise-ready” or “scalable”

I was interviewing for a platform engineer role for a company that from the outside their application doesn't seem too complex. Through the interview process, I find out that you cannot run much of their app locally to develop against and their staging environment does not reflect their production enviroment well. I asked them how they develop their new features if they cannot run the whole app locally and if they intorduce many bugs without testing locally. They response was along the lines that they hired good engineers that could understand all the complexity before deploying new features.

I believe having a solid development environment where you can develop and test the whole stack is key to making reliable software. Is this common at other companies that you cannot run the whole app locally to develop and test against?

bob1029 · 4 years ago

> Is this common at other companies that you cannot run the whole app locally to develop and test against?

Yes. I work in B2B banking software, so having an "authentic" environment to test against is extraordinarily complex. Even the big vendors in the space seem unable to maintain accurate facsimiles of production with their own decades-old core software, so smaller vendors stand no chance unless they adjust their entire strategy.

What we ultimately had to do was negotiate a way to test in the production environment in such a way that all the activity could be tracked and backed out afterwards. A small group of important customer stakeholders were given access to this pre production environment for purposes of validating things.

This is something that has never sat well with me. But, the way the business works the alternatives are usually worse (if you want to survive long-term, that is). We wasted 18 months waiting for a customer to have a test environment installed before we learned our lesson about how bad those environments are in reality.

gumby · 4 years ago

My gf works in UX for a major bank and they simply issued her a bank account in her name (with her SSN etc -- it's a real account, with ATM and debit cards and the like). If she and some co workers want to test something out relating to accounts they actually use these accounts to minimize this kind of problem.

They must have some special code that says that these accounts can have their fees zeroed and probably are subject to some special out of band audits. I don't really know the details of course but I was actually kind of impressed.

(no she is not actually a customer of her employer)

invalidname · 4 years ago

I worked at banks too but mostly segregated from the core business. Today I see observability tools as the solution for problems like that. Specifically tools like Lightrun where you can observe production, mitigate and fix directly from the IDE.

I wonder if other developers are familiar with this approach?

acwan93 · 4 years ago

Same here in e-commerce. We end up testing in production environments with an element somewhere that's in a development environment that can be tracked and actively watched. Amazon notoriously does not have a production environment, and eBay's is just unusable.

nijave · 4 years ago

Worked with banking software before; it can be incredibly complicated. I worked on proof of concept infrastructure for a Websphere monolith that ran on AIX servers. It had at least 10-15 services it connected to and each one of those were huge Java monoliths, too. Some of those apps had 8+ copies of /each/ environment (8x dev, 8x test, 4-8x stage/uat, 1-2x prod) so you could do integrated tests without stepping on other people or something software (software B might have a dedicated test environment of software A)

tluyben2 · 4 years ago

Yep, work in b2b banking as well and there is a huge issue with banking ‘test’ environments. Our partners (100s of companies, mostly banks/brokers/insurers/etc) mostly don’t even have test envs and if they do, they don’t represent production at all (sometimes the api is a completely different version etc). Our software allows to spin up a test/dev env locally or where ever but nothing much works without the partners and to dev/test those, those, you will have to dev/test against their production systems with real money.

In the beginning (more than 20 years ago), I insisted that we implement test systems ourselves so we didn’t have to test in production; that was naive; it was expensive and there is so much complexity that this does not work well.

john_the_writer · 4 years ago

I'm in that boat. We have a "microservice" code base, and spinning up the whole "app" is a Herculean task.

Even if you do manage to get it all running, getting the multitude of database with useable data adds an odyssey to your day.

We have some parts automated, but ops cannot keep up with all the changes in the micro apps.

Back when I worked on a mono-rails app with a React front end, getting things up and running was as simple as running the JS server, rake db seed and then rails s. We used puppeteer to test FE, and rspec for the BE. Two test suites, but we knew if something was wrong.

That same place sometimes had a net outage, and I wouldn't even notice because I was running everything locally.

Now I can only get everything running locally with confidence, if I block out an hour or so.

Honestly... microservices are great in theory, but make dev a huge pain. Docker does not mitigate the pain.

Nextgrid · 4 years ago

That's the reason why I walk away as soon as I hear "microservices". It's cargo-cult mentality where complexity and the issues you speak of above are considered a feature.

jamesfinlayson · 4 years ago

Yep I've had the same thing happen - a company I worked for went from having a monolith that could run locally, then everyone drunk the microservice Kool-Aid and we ended up with no way to run the microservices locally in a way that integrated with the monolith. There were staging sites with everything but there was one staging site per team.

danjac · 4 years ago

The worst cases are startups adopting microservices. I get where they are a solution to a large company with multiple teams hitting the pain of monolithic development, but a team of half a dozen adopting microservices is a big red flag.

mbrodersen · 4 years ago

Microservices are not even great in theory if you spend just a bit of time deeply thinking about the complexities that microservices introduce compared with simply using libraries/modules to organise your system.

bri3d · 4 years ago

Quite common.

Usually someone develops a remote or mixed local/remote development environment which is easier to use than configuring the full stack to run locally, and then the ability to ever do so again atrophies until nobody can figure out how anymore.

In some companies I've worked at this is fine and not a major setback. In others, it's been a mess. I don't think that "can run the stack locally" is a good litmus test for developer experience - the overall development experience is a lot more nuanced than that. "How long does it take to get going again if you replace your computer" and "what's the turnaround time to alter a single line of code" are usually better questions.

clepto · 4 years ago

I work at a VOIP services provider and we operate multiple brands which are companies we’ve acquired over several years. One thing that has rang true of every company we’ve acquired is that there is typically barely even a way to run one piece locally, let alone the whole thing.

Some of these when I took over there wasn’t even version control, and they were being developed by FTPing files to production server or even editing them live with vim or something. Certainly no reproducible containerization or even VMs to speak of.

I agree completely with you about having a solid development environment. I’ve spent the better part of a year creating such a thing for some of the brands, and it has increased our time to ship features and fixes probably ten fold.

temp8964 · 4 years ago

> they were being developed by FTPing files to production server or even editing them live with vim or something.

Many editors / IDEs (e.g. Notepad++) can directly connect to the FTP server, so working on a FTP servers looks almost same as working on a local folder. You just make changes to the files, and then refresh the web browser to see the changes.

asd88 · 4 years ago

I work at a company that has similar practices to what you describe (we don’t run our services locally). While it’s hard to get used to at first, I think the silver lining is that it forces everyone to write good unit tests, since it’s the only efficient way to run their code before publishing a PR. Having worked on the opposite end (at a team where we mostly tested locally and had almost no automated tests), I much prefer this environment.

fasteddie31003 · 4 years ago

My last job was working at a company where you could not bring up the app locally. It didn't start that way. When the company was smaller I could develop a whole feature from the backend to the frontend and see the whole flow. As we grew it became increasingly difficult to run the app locally. Then it became impossible. Bugs grew and development time increased honstly 4x to do the same feature I could have done 3 years ago. I now think a good clean developer experience is key to a stable app.

raffraffraff · 4 years ago

I'm at a company that works like this, and it's awful. There are also anti-patterns around git branching and container deployment. The staging environment is broken every other day and production deployment requires the full time effort of a very senior engineer who cherry-picks through git commits from one branch to the production branch (there are thousands of differences between these branches). I'm honestly done with them. They don't even know what sensible dev/test/deploy looks like.

sage76 · 4 years ago

I'm sure that paradoxically your company must have sky high hiring standards, with interviewers particularly grilling candidates about best practices?

This is not even sarcasm.

ingenieroariel · 4 years ago

I saw a project get themselves in that position by using a lot of AWS services. At some point it seems they had a dev environment but it quickly faded.

Their solution was to have an anonymized copy of prod to a few (like 3 for 20 engineers) environment where features could be tested. Engineers usually paired in order use them concurrently for unrelated features and they were very expensive.

I did not live enough to see the on-demand environments arrive - devops was always busy with operations tickets to continue building those.

matt_heimer · 4 years ago

A lot of it depends on the complexity and the technology stack.

Microservices based architecture with massive amounts of microservices and other service dependencies, its a pain to do more than local unit testing and mocks.

Are you using FaaS or a lot of other cloud features? Also difficult to fully test locally. Extract as much logic out of the cloud dependant code.

Often there will be sub-sections of large apps that can be run locally.

And if the only way to test is in production hopefully they practice some form of limited rollout so you can control who adopts the changes or similarly some form of optional feature flags that are used to enable the new behavior until it can become a new default. If their release model is "We are so good it goes live for everyone all at once" then that tends to make for a stressful release days.

kcartlidge · 4 years ago

> Microservices based architecture with massive amounts of microservices and other service dependencies, its a pain to do more than local unit testing and mocks.

For microservices if you use something like Consul for service discovery within those microservices then you can hit them with, eg, Postman and change your local Consul registrations so everything goes to a main environment except the handful of services you're working on - which you redirect to your local machine for debugging.

roflyear · 4 years ago

Extremely common. For some reason it is cool to put stuff in lambda or azure functions or whatever. Or to use 11 different stacks.

vbezhenar · 4 years ago

It is the case in my company. Production and staging is bunch of docker-composes scattered around several servers. We had to create another staging environment, I think it took several days to configure it. People run services locally, but not entire stack, that takes too much effort.

Right now I’m trying to understand and deploy Kubernetes. Scalability and availability is good, but one of the reasons that I want to make it possible to roll out entire stack locally.

Basically it’s lack of devops. We don’t have separate role and developers do the minimal job of keeping production running well enough, they got other tasks to do. There should be someone working on improving those things full time.

raffraffraff · 4 years ago

This: "Basically it's lack of devops". Companies think they can avoid hirig any infra/devops people and make all of that stuff the job of junior developers. They should contract someone it to set things up, at the very least.

hatware · 4 years ago

You can find developer first roles where the company puts a lot of time and energy into making developers the most efficient workers, but most companies are a cluster-fuck of resource management and you'll be lucky to find a healthy company that also puts developers first in this way.

In my experience, recreating staging as "production junior with caveats" comes down to cost and living with bad architecture/design. I'm not defending the practice, but I do think it's incredibly common.

ma2rten · 4 years ago

At Google, it is very team-specific but it is generally not possible to run servers locally for testing and there generally is no staging environment, but it might be possible to bring up some part of the stack on Google's clusters. Those severs may read production data or some test data. Other than that there are static checks, unit tests, code reviews and the release process to ensure reliability.

fasteddie31003 · 4 years ago

How do you develop features if you can't run the server locally? Do you just write code, get it approved in a code review and push it to production?

jamesfinlayson · 4 years ago

I've been at a place where the whole stack couldn't be run locally at all but everything is separated by well-defined API boundaries and there are mature mock servers where required.

The architecture was probably more complicated than it needed to be but it was also decades old.

nijave · 4 years ago

Yeah, once you have more than a small handful of dependencies (SaaS, databases, microservices). Some of the stuff I worked on will have built in stubbing/mocking so you can run parts of the stack locally and the test suite also uses the mocks/stubs.

icedchai · 4 years ago

In my experience, good local development environments are relatively rare. We had them at a couple of small <10 person B2B, SaaS startups I was involved with. In larger companies, there were just too many pieces to run everything locally. Or in companies with software development, but not focused on software, local environments seemed more of an afterthought. With increased dependencies on cloud services, "serverless" environments, etc. it can also be painful to run stuff locally. Obviously you can build around this (and should...) but it requires thinking ahead a bit, not just uploading your lambdas to AWS and editing them there...

inferiorhuman · 4 years ago

Yaknow I actually transitioned from doing dev stuff to ops at a smallish startup. I was brought in to help with a green field rewrite. Eventually it became clear just how much of a mess the deployment environments were so I cleaned everything up and parameterized things so that when we went to stand up the new production environment it was quick and painless.

I got hired on at megacorp to do pretty much that with a suite of green field apps. And so I did, across a few apps and a zillion different environments because all of a sudden spinning up a new integrated environment was easy. The devs ran a lot of stuff locally but still had cloud based playgrounds to use before they hit the staging environments. Perhaps the best part though was having the CI stuff hooked into appropriate integrations, so even if you couldn't run it locally you sure as shit tested it before it hit staging. SOC2 compliance was painful for the company but not so much for us.

If you're sensing a theme with "green field" you'd be on to something. Even in a largeish company it's not so much the complexity as it is the legacy stuff that'll get you. Some legacy stuff will just never be flexible (e.g. the 32-bit Windows crap we begged Amazon to let us keep running) and even the most flexible software will have enough inertia to make change painful.

Tangentially this is also why I get apprehensive when I see stuff like that linux-only redis competitor.

exdsq · 4 years ago

Yea, it’s common that one doesn’t exist. I work as a test engineer and am usually the one to build and support those environments, I’ve yet to find a company that has it all working well before I join. I actually made a killing building integration testing environments of blockchains and sticking them in CI too.

avg_dev · 4 years ago

Can you elaborate on the blockchain testing environments/CIs thing please?

bin_bash · 4 years ago

Local dev is great until production gets complex enough that you lose parity. That'll happen even earlier now that we run x86 on the server but ARM locally.

At my company we don't support local dev nor have any sort of staging environment. We have development linux servers connected to prod services and dbs. There are no pragmas or build flags to perform things differently in "dev". Things are gated with feature flags.

It scared me at first but now I think it makes sense: staging is always a pain and doing it this way we avoid parity problems between staging and prod. Local development would be impossible for a system at our scale but I think even a staging setup would result in more defects—not fewer.

roflyear · 4 years ago

Local dev isn't the end, it is the beginning. With it it allows you to get further than you would otherwise - and most importantly, when (not if) you do discover a problem on a lower environment (or prod) it gives you a place to try and replicate it. You run everything locally so hopefully you can make a change to mirror the problem... Not a guarantee but at least you have a good starting point.

Also your local dev should try and mirror prod as much as possible.

andyjohnson0 · 4 years ago

In my experience, yes.

Often it's due to complexity. Systems accrete stuff (features, subsystems, dependencies, whatever) over time due to changing business requirements. The more this happens, the harder it becomes to integrate them properly without breaking other parts of the system. And the system is running the business, often 24 hours a day, which makes migrations hard to do. So you end up of something thats basically too complex to run locally.

And particularly in regulated industries like banking you have the security teams locking everything down hard. Problems accessing systems and getting realistic test data, etc.

Deleted Comment

DeathArrow · 4 years ago

>Is this common at other companies that you cannot run the whole app locally to develop and test against?

It depends what you mean by the whole app. To reflect locally 100% of the conditions the big microservice app I am working on is very hard as I have to replicate locally multiple Kubernetes environments, Kibana, Data Dog, Elasticsearch, Consul and several databases and data stores.

What I can do, is to run one or more microservices locally, connected to remote development databases and speaking to other microservices and apps from some development Kubernetes environments.

xputer · 4 years ago

I'm sad to say this even happens at FAANG. So many engineers seem to either not want to or not know how to optimize their own workflow as they're building it. It's really frustrating when you want to contribute something to their project and the development workflow for verifying the application still starts after making any changes is "upload the 5GB deployment bundle to AWS". Like come on, you can't seriously expect people to be productive that way.

voidfunc · 4 years ago

> Is this common at other companies that you cannot run the whole app locally to develop and test against?

Extremely. We can't replicate our entire infrastructure in development - it's simply too complex and expensive. We use a combination of dev, integration, staging, and canary environments plus pilot environments to gauge quality before a full worldwide rollout.

grp000 · 4 years ago

lol. It sounds like they hire smart people. But smart people =/= good engineers. Good engineers keep things as simple as they can.

Gigachad · 4 years ago

Simple is easy on a small product. Your simple sql setup doesn’t work so well when you have terabytes of data and need to be running tens of millions of jobs per day.

scarface74 · 4 years ago

Why is it a big deal to run your app locally? You just give each developer access to a cloud account (either one account or one per developer) with wide enough guard rails.

Heck, when I am in a place with bad internet, I spin up a Cloud 9 (Linux) or Amazon Workspace account (Windows) and do everything remotely.

I’m not wasting my time trying to use LocalStack or SAM local.

jamesfinlayson · 4 years ago

It would be quicker for me to run things locally - in my team we all have access to an AWS account each but to test my code remotely I need to compile code, build my container, push my container to ECR then deploy the container to ECS - that last step takes minutes.

Hard agree.

History is littered with dead startups that designed for scale before they had enough usage to justify it. Within reason, having users knock your site over resulting in failures like the Twitter Fail Whale is a good problem to have.

With that said, you need to be prepared to scale up quickly once you have this problem. There's a reason Facebook counts its users in the billions, and Friendster is a footnote in internet history.

s1k3s · 4 years ago

Yes, but.. I am a software architect. ~90% of the job offers I get are from CEOs/CTOs who want me to join their small company to help them refactor because they can’t grow anymore. Tech debt kills as well.

dgb23 · 4 years ago

Is that a "but" or is that how things should work? You solve a problem when you have it or can reasonably predict it (in technical/engineering terms). I also think it's sensible to look for experts when you need them right?

icedchai · 4 years ago

For every company that has this problem, another several never made it far enough for it to matter.

noodle · 4 years ago

If they have the money to hire you and they have pent-up growth that refactoring will unlock, that really sounds like close to an ideal path for a startup to take. They're literally paying down their tech debt by paying to hire you. Maybe ideal would be a little earlier so it doesn't block growth, but yeah. In my experience way more startups have died from premature optimization

xboxnolifes · 4 years ago

If this pattern is successful, it's not killing them.

raffraffraff · 4 years ago

So how do you square they circle "Don't design to scale but scale up quickly when you need to"? You can't rewrite your stack just because you took on a new customer that is suddenly killing you.

I don't think that it's stupidly hard to get basic things set up for scaling at the start. These don't apply to everyone but:

- use a cloud provider, they let you grow

- write infrastructure as code (terraform, pulumi, cdk), it's not that hard and it makes building your 2nd, 3rd data center easier

- for the love of God think for 10 minutes about a scalable naming convention for data centers, infrastructure components, subdomains, IP ranges etc. This type of stuff causes amazing headaches if you get it wrong, and all you have to do is consider it at the start. Many AWS resources can't be renamed non-destructively, and are globally namespaced. Be aware of this. Don't paint yourself into a really stupid corner.

- even if you're not doing microservices, consider containers and an orchestrator that has scalability baked in (like Kubernetes). If you use a managed Kubernetes with managed node groups or fargate, you can basically forget about compute infrastructure from this point on

- deploy Prometheus, Grafana, Loki etc day one and build basic dashboards. With Kubernetes, you can get this installed quickly and within a day you'll at least be able to graph request/error count and parse logs.

- deploy an environment for developers to deploy to and use the product (even load test it)

The development cycle is now set up for building what you need to build. You can easily find developers at any level who have experience with containers, Kubernetes, Grafana so they can onboard quickly, be productive and troubleshoot their own stuff. They can refactor the monolith without having to invent service discovery, load balancing, ingress etc.

My personal experience with Kubernetes began at a start-up with a tiny team, none of whom had prior experience with it. We had started building a single monolithic app. But within a short time we figured out how to make a helm chart, add ingress and boom, we were "online". Every time someone was about to spend time and effort inventing something, we would quickly scan through the Kubernetes docs and CKAD material to see if we could avoid having to build it at all. We leverage cloud stuff too, where it was cheap and easy and scalable (like AWS SQS/SES/SNS/CodeArtifact/ECR/Parameter Store)

iovrthoughtthis · 4 years ago

i didn't see the words "it depends" anywhere in this

thats the whole point of engineering, picking/building the right solutions for the problems and context

some stuff just needs a vps, a php file and an sqlite db and it will scale

other stuff might need container orchestration or anywhere in-between (or beyond). it should depend on the context though and we should be prepared to change strategy as needed

bryanrasmussen · 4 years ago

>With that said, you need to be prepared to scale up quickly once you have this problem.

See I interpret that to mean that you should design for scale, but not implement or invest in scale until needed.

travisgriggs · 4 years ago

Is one of the problems though is that the early audiences appreciate this kind of over-engineering? If your investors/buy-in folks are being shown early demos and you mention "and this thing is gonna scale seamlessly up to 1,000,000 users because of tech words A, B, and C" then they get excited. They don't understand A, B, or C. But what they just heard is that you're planning for Big. And that stokes their aspirations.

roflyear · 4 years ago

Facebook is a great example. They did tons of things not traditionally known as highly scalable. Like one repo with the project coded in php which was largely a monolith.

cutler · 4 years ago

And an imperative codebase which looked like something out of Matt's Script Archive. I often wonder whether a startup today could pull-off something similar.