That's a very interesting set of findings. What is important to realize when reading this that it is a case of survivorship bias. The startups that were audited were obviously still alive, and any that suffered from such flaws that they were fatal had most likely already left the pool.
In 15 years of doing technical due diligence (200+ jobs) I have yet to come across a company where the tech was what eventually killed them. But business case problems, for instance being unaware of the true cost of fielding a product and losing money on every transaction are extremely common. Improper cost allocation, product market mismatch, wishful thinking, founder conflicts, founder-investor conflicts, relying on non-existent technology while faking it for the time being and so on have all killed quite a few companies.
Tech can be fixed, and if everything else is working fine there will be budget to do so. These other issues usually can't be fixed, no matter what the budget.
I've found that slowdown from tech debt killed as many companies as any other issue. It's usually caused by business owners constantly pivoting, but being too slow on the pivot and too slow to bring customer wishes to fruition (due to poor technical decisions and tech debt) is probably one of the top 5 reasons for dead companies I've seen.
That's a good point, tech debt can be a killer. But the more common pattern that I've seen is that companies that accumulate tech debt but that are doing well commercially eventually gather enough funds to address the tech debt, the companies that try to get the tech 'perfect' the first time out of the gate lose a lot more time, and so run a much larger chance of dying. The optimum is to allow for some tech debt to accumulate but to address it periodically, either by simply abandoning those parts by doing small, local rewrite (never the whole thing all at once, that is bound to fail spectacularly) or by having time marked out for refactoring.
The teams that drown in tech debt tend to have roadmaps that are strictly customer facing work, that can get you very far but in the end you'll stagnate in ways that are not easy to fix, technical work related to doing things right once you know exactly what you need pays off.
I’m not sure I agree. Technical debt is a symptom, it’s the consequence of bad management that leads to working on the wrong things.
If you’re running a startup and haven’t yet found your feet in terms of a product offering, and you’re building your product(s) in such a way that technical debt builds up through continuously layering half-baked on half-baked, it’s indicative that you’re not actually pivoting and not actually evolving, you’re just adding new half-baked ideas to a half-baked system… and being able to do that at twice the speed isn’t going to address the real problem: half-baked ideas don’t make a product, whether that’s 10 half-baked ideas or 100.
My experience is that any company in which evolution/experiments/pivoting is constrained within the boundaries of what already exists because of the sunk cost fallacy has made a grave error at a leadership level, not at a code level. If you can’t validate something without mashing more code into your system, that’s the problem to address.
I’ve seen companies with horrendous tech debt die, and you could certainly frame their death as being a consequence of the tech debt (“if they had just got the perfect system…”) but that assumes the perfect system would somehow prevent them from making the mistakes that got them there in the first place. It wouldn’t. The technical debt is an expression of their mistakes, not the cause. You could dump the perfect system at their feet and they’d be surrounded by garbage again a few years from now.
What killed them was they never found PMF. Eventually the tech debt slowed them down so they couldn't take as many swings at finding PMF.
But in the counterfactual if they'd tried really hard to avoid tech debt that would have slowed them down at the beginning, not to mention there are plenty of organizations that will write very complex abstract code to avoid tech debt, and end up making the code base incredibly painful to work with. So overall did get they get less swings?
I've worked on a lot of old code bases and the biggest issues I've run into, issues that crippled development velocity, were 95% boneheaded decisions and overengineering. And never the types of code quality issues someone like Uncle Bob talks about in Clean Code.
And keeping 100% of all features instead of removing the least-used features as you add new ones to keep tech debt from growing indefinitely and reaching a point where new features take months to ship.
> What is important to realize when reading this that it is a case of survivorship bias.
This is totally true, but taken too seriously it leads to inability to learn anything from almost any information whatsoever. What’s more, whatever you do (whether you take the advice of those who have gone before or not), you will not be able to decide whether you made good decisions or merely “survived”.
How does one proceed when anything can be survivorship bias and determining cause and effect for large scale operations like running a business is essentially impossible.
(When I say “anything can be survivorship bias” I specifically mean that no matter the cohort you cannot decide whether you’ve accidentally excluded unknown failures, and hence you have no assurance of the actual strength of any analysis you do).
> In 15 years of doing technical due diligence (200+ jobs) I have yet to come across a company where the tech was what eventually killed them.
Not my experience..
What does a “tech failure” look like? Do the servers catch fire? Is the web site down? Maybe people are unable to login to their stations?
Hi-tech business is “Tech”, so the failure of the business is in fact the tech failing. More specifically, the business was unable to direct the tech to solve real problems and solve them well enough.. New hires took too long to onboard.. Engineers were only superficially productive.. Communication between the stake holders and engineers was lacking.. etc.. etc..
Take note that in all scenarios above “work” is being done, “progress” is being made.. ceremonies are everywhere and success is seemingly around the corner.. Or is it?
It’s just very hard to see these issues, they are hidden under layers of meetings, firings, hiring, pivots, milestones with little progress in actual business value.
I think the harder you scrutinize the distinction between a tech problem and a business problem, the harder it becomes to find it.
When there appears to be such a distinction, that's usually a manifestation of something like Conway's law, a symptom that there exists an unhealthy organizational divide between business and technology.
Obviously not, it's a problem to decide to fake non-existing tech but more of a management decision than a problem with the technology itself, there is an infinite number of things that don't exist, no matter how much you want them to exist and if you are not capable of coming up with a working solution and rely on the world around you to move fast enough to bail you out then I would say that is a psychological problem more than anything else.
A common theme right now is 'AAI', using people to fake an AI that may not come into being at all, let alone before your runway (inevitably) runs out.
I saw one "secretary AI" that schedules meetings over email in your calendar. Just cc it to start using it (once you signed up). The idea seemingly being, fake it with low cost outsourcing to prove there's a demand for this and then make it.
The developers you'd hire to make it an actual AI and the developers you'd hire to make it a Mechanical Turk are very different skill sets.
I was in a startup that failed in part due to tech issues. The AI model just didn’t work. There were a lot of other problems but if the tech worked, they could have easily gotten paying customers.
> yet to come across a company where the tech was what eventually killed them
I would think that a poor quality product, or one not as good as competitors would be a big killer. Google, Facebook, Amazon have amazingly superior products. I think you're missing something.
> Simple Outperformed Smart. As a self-admitted elitist, it pains me to say this, but it’s true: the startups we audited that are now doing the best usually had an almost brazenly ‘Keep It Simple’ approach to engineering.
I wrote about this before that as an industry, we have made writing software complex for complexity's sake.
> imagine, in a world where there wasn't a concept of clean code but to write code as simply as possible. not having thousands of classes with indirection.
what if your code logic was simple functions and a bunch of if statements. not clever right, but it would work.
what if your hiring process was not optimizing for algorithm efficiency but that something simply works reliably.
imagine a world where the tooling used by software engineers wasn't fragile but simple to use and learn. oh the world would be a wonderful place, but the thing is most people don't know how to craft software. but here we're building software on a house of cards [0]
Hot take: The current trend of writing code, AND hiring engineers, is the way it is because everyone thinks they're gonna be the next FAANG-sized company, and need to be able to write FAAANG-quality code and engineer FAANG-quality architecture from the start - with respect to scalability.
Have you seen the personal blogs of devs today? What should be a simple HTML + CSS website with the simplest hosting option possible, is now written in a framework with thousands of dependencies, containerized, hosted on some enterprise level cloud service using k8s.
That's great and all if you suddenly need to scale your blog to LARGE N number of readers, but the mentality is still persistent - when one should be focused on core features and functionality - in the simplest way possible, you're bogged down with trying to configure and glue together enterprise-level software.
Maybe it's a bit unfair to put it that way - a lot of engineers know the various systems and services in and out, and prefer to do even the simplest things that way. But I've lost count how many times I've encountered devs. that BY DEFAULT start with the highest level of complexity, to solve the simplest problems, for no other reason that "but what if" and "it feels wrong that it should be that easy".
I don't know that this is a fair comparison, because side projects can and are often a way to explore ideas, understand tech, play around, etc. So I don't know that I'd agree that it's a great extrapolation to the way an engineer works based on side projects or a blog that may have different objectives.
I do agree with the sentiment though, that we want to be watching for indicators to how a team member approaches problems.
> the way it is because everyone thinks they're gonna be the next FAANG-sized company, and need to be able to write FAAANG-quality code and engineer FAANG-quality architecture from the start
I don't know it's fair to say everyone, but is something I agree companies, especially startups should filter for. When I acted as hiring manager, and was trying to build SRE as an example, I would remind candidates, and the team continuously that we're not google. So while we want to bring ideas and approaches in from what google has published as "SRE", we do need to consciously leave large parts out that are appropriate to our needs and stage of maturity.
I disagree that they go down the path they do because they think they’re going to be FAANG-sized, but rather it’s a case of cargo-culting, “we’ll use these tools/architectures because the best companies use them, therefore they must be the best tools/architectures.”
I think your general point is true but the personal blogs of devs angle is maybe not the most illustrative one.
We tend to apply industrial strength tools to our personal projects because it's some combination of what we already know, or we're trying to learn or refine an unfamiliar skill.
If you just gave me a linux shell I would not be able to confidently provision a secure webserver for static hosting. But I do know how to write cloudformation and deploy it. Sure this is a personal moral weakness by the standards of HN whatever, but it's where my career has led me so these are the tools I have.
side note: FAANG is an obsolete acronym according to The Economist, it's now Microsoft - Amazon - Meta - Apple - Alphabet, leading to the new acronym, MAMAA, which has the nice result that we can now talk about the outsized influence of Big MAMAA in the tech world.
I never understand this idea of picking an arbitrary set of language features and saying, “what if your code logic was simple functions and a bunch of if statements”. The complexity won't magically go away, it'll just appear in a different set of problems.
I think it's helpful to divide complexity into complexity in the business logic / problem you're trying to solve, which cannot be eliminated from a pure technical perspective (you should still try to simplify it through discussions with stakeholders though!), and complexity that isn't necessary to solve the problem.
Oftentimes the latter category could be necessary if you were at much higher scale, or if the business evolved in some way, etc., which is where this sort of stuff tends to originate. Just yesterday we were talking at my company about extracting a service in Go, since it's very high scale, very simple, and doesn't change much. On one hand, it's pretty likely we'll need to do that at some point, but on the other, it's not causing any issues right now, so there's not much point in doing it at the moment. Had we gone forward, that would have added complexity for a theoretical concern that may or may not happen in the future.
if this was the case we wouldn't have the problem of AbstractFactory that has plagued the Java ecosystem.
if this was the case Golang wouldn't be here seeking to simplify things by not having classes. And having __err__ handling like it does.
it's not pretty but it works.
I pick on the Java because it's ecosystem is broad. However, the over-engineered complexity that resides there makes you wanna stay away.
Everyone does lip service to simplicity, but in reality simplicity is really difficult.
If you have seven conditions driving a decision, a bunch if's might be the simplest implementation. If you have hundreds of conditions, a tree of if's becomes impenetrable. There is no one-size-fits-all when it comes to simplicity.
Some problems are inherently complex. You can't design a payroll system or tax calculation system which is simpler than the set of rules and regulations it has to implement.
Even in that case, a tree of if's isn't that bad (it's not great), but far worse is when you have the same set of if statements copied and pasted around dozens of places. Because you will forget to update one of them at some point.
Thousands of classes with indirection is not clean code and write code as simply as possible is tautology. Of course it should be as simple as possible. The interesting question is what counts as simple.
Setting that aside though, the author seemed to mostly be talking about architectural simplicity in the article. He specifically called out "premature move to microservices, architectures that relied on distributed computing, and messaging-heavy designs" which I think is spot on. Distributed systems are fundamentally hard and involve a lot of difficult tradeoffs. But somehow we have convinced ourselves as a profession that distributed systems are somehow easier.
The hardest job as a software engineer is to come up with simple and obvious solutions to a hard problem.
Or you can stitch together eight different cloud services and let someone else debug that crap in prod. Not to mention subpar performance and an astronomical cloud bill.
What? That has nothing whatsoever to do with tautology. It's just a statement you agree with. If everyone else agreed with it, it might at most be a truism or an uninteresting statement, but evidently they do not. (They might claim to, but reality shows they optimise for other things - in my experience the simplest work, which does not always mean the simplest code, especially when you're accustomed to the mystic rituals of the Javanese tribes.)
PipeWire is an example of building a Linux audio daemon on "microservices, architectures that relied on distributed computing, and messaging-heavy designs":
- It takes 3 processes to send sound from Firefox to speakers (pipewire-pulse to accept PulseAudio streams from Firefox, pipewire to send audio to speakers, and wireplumber to detect speakers, expose them to pipewire, and route apps to the default audio device).
- pipewire and pipewire-pulse's functionality is solely composed of plugins (SPA) specified in a config file and glued together by an event loop calling functions, which call other functions through dynamic dispatch through C macros (#define spa_...). This makes reading the source less than helpful to understand control flow, and since the source code is essentially undocumented, I've resorted to breakpointing pipewire in gdb to observe its dynamic behavior (oddly I can breakpoint file:line but not the names of static TU-local functions). In fact I've heard you can run both services in a single daemon by merging their config files, though I haven't tried.
- wireplumber's functionality is driven by a Lua interpreter (its design was driven by the complex demands of automotive audio routing, which is overkill on desktops and makes stack traces less than helpful when debugging infinite-loop bugs).
- Apps are identified by numeric ids, and PipeWire (struct pw_map, not to be confused with struct spa_dict) immediately reuses the IDs of closed apps. Until recently rapidly closing and reopening audio streams caused countless race conditions in pipewire, pipewire-pulse, wireplumber, and client apps like plasmashell's "apps playing audio" list. (I'm not actually sure how they resolved this bug, perhaps with a monotonic counter ID alongside reused IDs?)
I feel a good deal of this complexity is incidental (queues pushed to in one function and popped from synchronously on the same thread and event callback, perhaps there's a valid reason or it could be removed in refactoring; me and IDEs are worse at navigating around macro-based dynamic dispatch than C++ virtual functions; perhaps there's a way to get mixed Lua-C stacktraces from wireplumber). I think both the multi-process architecture and ID reuse could've been avoided without losing functionality. Building core functionality using a plugin system rather than a statically traceable main loop may have been part of the intrinsic complexity of building an arbitrarily-extensible audio daemon, but I would prefer a simpler architecture with constrained functionality, either replacing pipewire, or as a more learnable project (closer to jack2) alongside pipewire.
> we have made writing software complex for complexity's sake.
I think it’s rather that complexity naturally expands to fill up the available capacity (of complexity-handling ability). That is, unless conscious and continuous effort is spent to contain and reduce complexity, it will naturally grow up to the limit where it causes too much problems to be viable anymore (like a virus killing its host and thus preventing further spread). This, in turn, means that the software industry tends to continually live on the edge of maximum complexity its members can (barely) handle.
> I think it’s rather that complexity naturally expands to fill up the available capacity (of complexity-handling ability). That is, unless conscious and continuous effort is spent to contain and reduce complexity, it will naturally grow up to the limit where it causes too much problems to be viable anymore
I disagree that this is something that "naturally" happens. A lot of this thread is about how adding complexity is either a deliberate choice made by software developers or just that the developer simply was never taught how to do it the simple way--both of which illustrate a gap in software development education. When the tutorial about How To Create a TODO App starts with "Step 1: Install Kubernetes", I'd argue we have an education problem.
The problem is simple means diffrent thing in a small codebase than in a big one.
A bunch of if statements in a code that is small enough to understand everything is ok but when it become big it's hard to understand flow of data.
I do favor simple code but some complexity/abstracion is needed to make it easier to understand
But picking the right abstractions that aren't leaky in any of the aspects you really care about is critical, hard to measure (leakiness isn't obvious, nor what kind of aspects you care about), hard to get right, and hard to maintain (because your abstraction may need to evolve, which is extra tricky).
Obviously, getting that right makes subsequent developments much, much easier, but it's hardly a simple route to success.
> there wasn't a concept of clean code but to write code as simply as possible
Sounds good "on paper" - in fact, is tautologically true - but it's hard to find two people who agree on the definition of "simple". You say "not having thousands of classes with indirection", and I've definitely seen that over-design of class hierarchies create an untouchable mess, but I've seen designs in the other direction (one giant main routine with duplicated code instead of reusing code) that were defended as "simple".
A lot of complexity comes from premature scaling due to cargo cult or ergonomics.
But I argue a lot of complexity and bugs comes from poor/unclear/conflicting thinking. Especially when it crosses boundaries between multiple developers who had to modify it but didn't truly internalize that part/design of the code.
I've seen most of the architectural problems in consulting - it's amazing how a team of clever engineers can take a simple thing and make it sooo convoluted.
Microservices, craptons of cloud-only dependencies, no way to easily create environments, ORMs and tooling that wraps databases creating crazy schemas... The list goes on and on; if you're early, you could do a lot worse than create a sensible monolith in a monorepo that uses just postgres, redis, and nginx and deploys using 'git pull' ...
The worst architecture I ever saw came from consultants, who built the initial bits of a startup I was hired into. It was nice to have a no-longer-present scapegoat to shake fists at when frustrated, but over time I came to realize their most maddening choices were at the behest of one of our founders, who had no software experience.
I saw the same thing. Founders asking the world of consultants who would try to deliver and then fail to be a responsible engineer. I started my previous job by telling the founders they were asking for the wrong things and the consultants work needed to be thrown out. Thankfully they listened and we ended up with a TypeScript monorepo monolith deployed to Heroku.
That’s just not true as a categorical statement. Performance aside redis has all sorts of interesting data types, operations and primitives that pg doesn’t that you might want to leverage. It fulfills a different role
A Spring Boot service doesn't have to Microservice - you can happily fatten it up into a monolith. Cloud-only dependencies would come into play for Spring cloud (or something that is using cloud specific features) - for a "vanilla" CRUD app, they are not needed. Creating virtual/physical environments is out of Spring Boot's scope and better left to external tools though it has support for separate environments via profiles. ORMs/tooling that wraps database doesn't have to be part of Spring Boot - using Hibernate/JPA isn't mandatory; plain JDBC Template with hand-written SQLs would work fine.
>>> Business logic flaws were rare, but when we found one they tended to be epically bad.
oh yes ...
I always bang on to my junior staff that their job was known as "analyst programmer" for a reason. The analyst part matters probably even more than the programmer part. In large companies just discovering what needs to be coded is 90% of the job, (the securely coding it in the constraints of the enterprise the other 90% while the final 90% is marketing your solution internally)
> In large companies just discovering what needs to be coded is 90% of the job
Yes, but that is quite massive dysfunction of those companies. Meaning, we can yell at analysts-programmers as much as we want, what really needs to be fixed is the process that makes finding out requirements so ridiculously hard.
And yes, I work in one of those companies, it very clearly is dysfunction.
> discovering what needs to be coded is 90% of the job,
But you still have to predict based on a two-sentence description in a JIRA ticket how many "story points" it's going to take with 95% accuracy a dozen times within the span of a single "sprint planning session" every two weeks.
This goes with doing the first 90% of the work, then the second 90% of the work then the last 90% of the work.
And engineers multiplying their initial estimate by 3, the project manager then multiplying that by 3 and rounding it up to be ten times more than the initial estimate.
I suspect it's a British (perhaps commonwealth) colloquialism - 'to bang on [about something]' is to go on and on and on talking about it, with some implication of 'too much' or obsessiveness.
(Also, notice it's 'bang on to' the staff, not 'bang on' them. That is, the staff are the indirect object; the thing which is being said - banged on about - is the direct object.)
> Generally, the major foot-gun that got a lot of places in trouble was the premature move to microservices, architectures that relied on distributed computing, and messaging-heavy designs.
It is interesting, I've been at a company for a few years now and we've been slowing trying to break apart the two monoliths that most of the revenue generating traffic passes through. I am on board with the move to microservices. It's been a lot of fun but also a crazy amount of work and time and effort has been spent to do this.
I've pondered both sides of this argument. On one hand if this move had been done earlier it might not have been as difficult a multi-year project. On the other hand, when I look at the Rails application in particular, it was coded SO poorly that I if it was just written better, initially, it wouldn't even need to be deconstructed at this point. Also, if the same engineers that wrote that Rails app tried to write a bunch of distributed, even-driven microservices instead of the Rails app, we would probably be in even worse shape. (ᵔ́∀ᵔ̀)
Usually a link to a humorous YT video would be inappropriately uninteresting on HN, but this classic and brief satire of microservices is actually quite on point about precisely what is so dangerous about a microservices architecture run amok
Summary: really trivial sounding feature requests like displaying a single new field on a page can become extremely difficult to implement and worse, hard to explain to stakeholders why.
This was 100% true for that startup I worked for as a side job. They would have been so much better off just building a standard java, PHP or .NET back end and calling it a day.
The head engineer (who had known the guy funding the thing since childhood) had no clue how node, stateless architecture, or asynchronous code worked. He had somehow figured out how to get access to one particular worker of a node instance, through some internal ID or something, and used that to make stateful exchanges where one worker in some microservice was pinned to another. Which goes against everything node and microservices is about.
I tried to talk some sense into them but they didn’t want to hear it. So for the last six months I just did my part and drained the big guy’s money like everyone else. I hate doing that - way more stressful than working your ass off.
Its kind of discouraging to see the part where he says almost no one gets web tokens right the first time. Working on projects as someone entering the industry, its pretty clear that security is the most important part of a web app, and its so seemingly easy to get woefully wrong, especially if you’re learning this stuff on your own and trying to build working crud projects
It's a chicken egg problem. Developers use JWTs because it's what they think they know. Companies build libraries to support what developers are using. Security engineers say JWTs are easy to screw up [1]. Newer frameworks offer ways to move off of JWTs. New programming language comes out. New frameworks built for that programming language. What is someone most likely to build first as an integration? What developers are using. JWTs become defacto for a new framework. Security engineers report the same bugs they've seen. Even more languages and frameworks come out. Rinse. Lather. Repeat. Write up the same OAuth bug for the 15th time.
Edit: I was actually writing this code tonight myself for a project instead of it already being baked into the platform framework because SSO is only available as an "enterprise" feature and it's $150 a month for static shared password authentication. So market forces incentivize diverging standards.
That flow chart in the shared link is very funny! Just this year, I was forced to migrate to a new internal authentication framework that... drumroll... uses JWTs for session management. Google tells me that it was already discussed on HN here: https://news.ycombinator.com/item?id=18353874
JWTs solve problems about statelessness. Most companies don’t have these problems and are better off with stateful basic auth tokens/cookies that are widely understood and supported and can be easily revoked.
Also, signed and/or encrypted communication is usually easier to implement without involving JWTs.
Best thing to do in security is to not roll your own and instead use trusted libraries that have industry-reviewed sane defaults. One way to check: look at the issues and PRs in the public repo and see if security-focused issues are promptly addressed, especially including keeping docs up-to-date. Security professionals are pedantic (for good reason).
Asymmetric cryptography solves problems of statelessness: i.e. encrypt your sensitive|read-only data with your public key, decrypt it with your private key, beep boop, you can now use your client as a database. JWTs are a whole other unnecessary lasagne of complexity – not good complexity but random complexity, like the human appendix – which invites bugs and adds nothing above the former in most implementations. (Hell, my current company generates JWTs and then uses them as plain old 'random' keys to look up the session data in a database. It's hilarious but also awful.)
Can anyone cite a single real world example of a fully stateless system being run for the purpose of business? I ask this every time JWTs come up and no one can answer it.
As soon as you tap the database on a request for any reason, whether it's for authorization or anything else, you might as well kiss JWTs goodbye.
Then again, just don't use them anyway, because they have no benefit. Zero. Disagree? Prove it. I'm sure there's some infinitesimally small benefit if you could measure it, but the reality is that JWTs are saving you from an operation that databases and computers themselves are designed to be extremely good at.
Don't use JWTs. They're academic flim-flam being used to sell services like Auth0.
It's nuts to me that so many companies have moved off cookies for web app auth state. They're simple, they're well supported, they require very little work on the browser side, and the abstractions around them on the server side are basically bulletproof at this point.
I see all this talk about authentication, and it's just literally never been a problem or concern for my company.
Why not look into an open source auth solution such as supertokens? It's almost free and you can self-host. That way you implement your own auth system but the security issues are mostly dealt by them.
Yesterday I was working on updating code that implements Microsoft Open ID Connect (produces a JWT).
Their documentation [1] is exceptional - all the gotchas and reasons for practices are clearly explained and there’s a first class library to handle token validation for you. I even ended up using the same library to validate tokens from Google.
Perhaps not all vendors produce equally well written documentation but I think it’s a lot easier to get it right today than it was 5 years ago.
That's usually because security is a bolt-on instead of bake-in within the control and data structures themselves. Too many people interpret "Make It Work Make It Right Make It Fast" to mean security is implemented at the "Make It Right" stage, when it should be at the "Make It Work" stage. That's if they're the lucky ones who get security designed in from the beginning into the architecture.
We're paying for the sins of that in Unix these days, the kernel attack surface is in-feasibly large to remediate to correctness anytime soon (if ever?).
I think there is still more to it that just not taking it seriously or planning for it.
JWT in particular has the weird quirks you need to know to prevent encryption swapping attacks, and I'm sure there's more traps I myself am not aware of. At this point I think security can be seen on the same plan as legal: assuming a random dev will be able to plan and navigate out all the issues by sheer common sense hasn't been a viable approach for long now.
That's how a software implementation by a newbie works.
You can't expect a newbie to take security into account before the software is implemented.
Instead, there should be a custom to rectify all the security errors in the end before the software is pushed to the server.
The only question I have is around your point on monorepos - every monorepo I’ve seen has been a discoverability nightmare with bespoke configurations and archaic incantations (and sometimes installing Java!) necessary to even figure out what plugs in to what.
How do you reason about a mono repo with new eyeballs? Do you get read in on things from an existing engineer? I struggle to understand how they’d make the job of auditing a software stack easier, except for maybe 3rd party dependency versions if they’re pulled in and shared.
Monorepos do require upkeep beyond that of single-product repositories. You need some form of discipline for how code is organized (is it by Java package? by product? etc). You need to decide how ownership works. You need to decide on (and implement) a common way to set up local environments. Crucially, you need to reevaluate all these decisions periodically and make changes.
On the other hand... this is all work you'd have to do anyways with multiple repositories. In the multi-repo scenario, it's even tougher to coordinate the dev environment, ownership, and organization principles - but the work isn't immediately obvious on checkout, so people don't always consider it.
Regarding auditing, I have always found that having all the code in one place is tremendously useful in terms of discoverability! Want to know where that class comes from? Guaranteed if it's not third-party, you know where it is.
Not to minimize the pain of poorly-managed monorepos - it's not a one-size-fits-all solution, and can definitely go sideways if left untended.
> 2) It's easy to get out of sync with what version of software corresponds to what branch/tag in each repo.
I'd like to hear how others solve this. The way I've addressed this is I bake into the build pipeline some way to dump to a text file all the version control metadata I could ever want to re-build the software from scratch. Then this text file is further embedded into the software primary executable itself, in some platform-compatible manner. Then I make sure the support team has the tooling to identify it in a trivial manner, whether a token-auth curl call to retrieve it over a REST API, or what have you. This goes well beyond the version number the users see, and supports detailed per-client patching information for temporary client-specific branches until they can be merged back into main without exposing those hairy details into the version number.
While this works for me and no support teams have come to me yet with problems using this approach, it strikes me as inelegant and I'm for some reason dissatisfied with "it ain't broke so don't fix it".
Once I worked on a team that none of the engineers knew that jwt payload was readable on the frontend. They were in shock when I extracted the payload and started asking questions about the data structure.
I mean, I'd be rather surprised too. What were you using JWTs for, if not asymmetric crypto? Presumably you weren't using it to sign the tokens, if they were surprised the client could access them? And I can't see many contexts where you would use it with a shared secret, where just sending JSON over HTTPS wouldn't suffice. (I'm assuming 'frontend' here denotes a client on the other side of the trust boundary.)
I'm not getting your comment. The payload is not encrypted. I think you refer to the signature. The payload can always be decoded. It's just JSON into base64.
For SSO? The biggest advantage (besides being stateless) about a JWT is that it is signed with an asymetric key and the client can validate the authenticity of the content. You can encrypt the content of the token, but that does not make to much sense (because the client anyway needs to decrypt it).
In 15 years of doing technical due diligence (200+ jobs) I have yet to come across a company where the tech was what eventually killed them. But business case problems, for instance being unaware of the true cost of fielding a product and losing money on every transaction are extremely common. Improper cost allocation, product market mismatch, wishful thinking, founder conflicts, founder-investor conflicts, relying on non-existent technology while faking it for the time being and so on have all killed quite a few companies.
Tech can be fixed, and if everything else is working fine there will be budget to do so. These other issues usually can't be fixed, no matter what the budget.
The teams that drown in tech debt tend to have roadmaps that are strictly customer facing work, that can get you very far but in the end you'll stagnate in ways that are not easy to fix, technical work related to doing things right once you know exactly what you need pays off.
If you’re running a startup and haven’t yet found your feet in terms of a product offering, and you’re building your product(s) in such a way that technical debt builds up through continuously layering half-baked on half-baked, it’s indicative that you’re not actually pivoting and not actually evolving, you’re just adding new half-baked ideas to a half-baked system… and being able to do that at twice the speed isn’t going to address the real problem: half-baked ideas don’t make a product, whether that’s 10 half-baked ideas or 100.
My experience is that any company in which evolution/experiments/pivoting is constrained within the boundaries of what already exists because of the sunk cost fallacy has made a grave error at a leadership level, not at a code level. If you can’t validate something without mashing more code into your system, that’s the problem to address.
I’ve seen companies with horrendous tech debt die, and you could certainly frame their death as being a consequence of the tech debt (“if they had just got the perfect system…”) but that assumes the perfect system would somehow prevent them from making the mistakes that got them there in the first place. It wouldn’t. The technical debt is an expression of their mistakes, not the cause. You could dump the perfect system at their feet and they’d be surrounded by garbage again a few years from now.
But in the counterfactual if they'd tried really hard to avoid tech debt that would have slowed them down at the beginning, not to mention there are plenty of organizations that will write very complex abstract code to avoid tech debt, and end up making the code base incredibly painful to work with. So overall did get they get less swings?
I've worked on a lot of old code bases and the biggest issues I've run into, issues that crippled development velocity, were 95% boneheaded decisions and overengineering. And never the types of code quality issues someone like Uncle Bob talks about in Clean Code.
This is totally true, but taken too seriously it leads to inability to learn anything from almost any information whatsoever. What’s more, whatever you do (whether you take the advice of those who have gone before or not), you will not be able to decide whether you made good decisions or merely “survived”.
How does one proceed when anything can be survivorship bias and determining cause and effect for large scale operations like running a business is essentially impossible.
(When I say “anything can be survivorship bias” I specifically mean that no matter the cohort you cannot decide whether you’ve accidentally excluded unknown failures, and hence you have no assurance of the actual strength of any analysis you do).
Not my experience..
What does a “tech failure” look like? Do the servers catch fire? Is the web site down? Maybe people are unable to login to their stations?
Hi-tech business is “Tech”, so the failure of the business is in fact the tech failing. More specifically, the business was unable to direct the tech to solve real problems and solve them well enough.. New hires took too long to onboard.. Engineers were only superficially productive.. Communication between the stake holders and engineers was lacking.. etc.. etc..
Take note that in all scenarios above “work” is being done, “progress” is being made.. ceremonies are everywhere and success is seemingly around the corner.. Or is it?
It’s just very hard to see these issues, they are hidden under layers of meetings, firings, hiring, pivots, milestones with little progress in actual business value.
When there appears to be such a distinction, that's usually a manifestation of something like Conway's law, a symptom that there exists an unhealthy organizational divide between business and technology.
This doesn’t count as a tech problem?
A common theme right now is 'AAI', using people to fake an AI that may not come into being at all, let alone before your runway (inevitably) runs out.
The developers you'd hire to make it an actual AI and the developers you'd hire to make it a Mechanical Turk are very different skill sets.
I would think that a poor quality product, or one not as good as competitors would be a big killer. Google, Facebook, Amazon have amazingly superior products. I think you're missing something.
How about the cases where it caused fines due to failed security compliance that didn’t help the situation. Thinking fintech companies especially.
Deleted Comment
Dead Comment
I wrote about this before that as an industry, we have made writing software complex for complexity's sake.
> imagine, in a world where there wasn't a concept of clean code but to write code as simply as possible. not having thousands of classes with indirection.
what if your code logic was simple functions and a bunch of if statements. not clever right, but it would work.
what if your hiring process was not optimizing for algorithm efficiency but that something simply works reliably.
imagine a world where the tooling used by software engineers wasn't fragile but simple to use and learn. oh the world would be a wonderful place, but the thing is most people don't know how to craft software. but here we're building software on a house of cards [0]
:[0] - https://news.ycombinator.com/item?id=30166677
Have you seen the personal blogs of devs today? What should be a simple HTML + CSS website with the simplest hosting option possible, is now written in a framework with thousands of dependencies, containerized, hosted on some enterprise level cloud service using k8s.
That's great and all if you suddenly need to scale your blog to LARGE N number of readers, but the mentality is still persistent - when one should be focused on core features and functionality - in the simplest way possible, you're bogged down with trying to configure and glue together enterprise-level software.
Maybe it's a bit unfair to put it that way - a lot of engineers know the various systems and services in and out, and prefer to do even the simplest things that way. But I've lost count how many times I've encountered devs. that BY DEFAULT start with the highest level of complexity, to solve the simplest problems, for no other reason that "but what if" and "it feels wrong that it should be that easy".
> Have you seen the personal blogs of devs today?
I don't know that this is a fair comparison, because side projects can and are often a way to explore ideas, understand tech, play around, etc. So I don't know that I'd agree that it's a great extrapolation to the way an engineer works based on side projects or a blog that may have different objectives.
I do agree with the sentiment though, that we want to be watching for indicators to how a team member approaches problems.
> the way it is because everyone thinks they're gonna be the next FAANG-sized company, and need to be able to write FAAANG-quality code and engineer FAANG-quality architecture from the start
I don't know it's fair to say everyone, but is something I agree companies, especially startups should filter for. When I acted as hiring manager, and was trying to build SRE as an example, I would remind candidates, and the team continuously that we're not google. So while we want to bring ideas and approaches in from what google has published as "SRE", we do need to consciously leave large parts out that are appropriate to our needs and stage of maturity.
We tend to apply industrial strength tools to our personal projects because it's some combination of what we already know, or we're trying to learn or refine an unfamiliar skill.
If you just gave me a linux shell I would not be able to confidently provision a secure webserver for static hosting. But I do know how to write cloudformation and deploy it. Sure this is a personal moral weakness by the standards of HN whatever, but it's where my career has led me so these are the tools I have.
Oftentimes the latter category could be necessary if you were at much higher scale, or if the business evolved in some way, etc., which is where this sort of stuff tends to originate. Just yesterday we were talking at my company about extracting a service in Go, since it's very high scale, very simple, and doesn't change much. On one hand, it's pretty likely we'll need to do that at some point, but on the other, it's not causing any issues right now, so there's not much point in doing it at the moment. Had we gone forward, that would have added complexity for a theoretical concern that may or may not happen in the future.
If you have seven conditions driving a decision, a bunch if's might be the simplest implementation. If you have hundreds of conditions, a tree of if's becomes impenetrable. There is no one-size-fits-all when it comes to simplicity.
Some problems are inherently complex. You can't design a payroll system or tax calculation system which is simpler than the set of rules and regulations it has to implement.
I mean, it worked for Amazon. I saw the code.
Even in that case, a tree of if's isn't that bad (it's not great), but far worse is when you have the same set of if statements copied and pasted around dozens of places. Because you will forget to update one of them at some point.
Setting that aside though, the author seemed to mostly be talking about architectural simplicity in the article. He specifically called out "premature move to microservices, architectures that relied on distributed computing, and messaging-heavy designs" which I think is spot on. Distributed systems are fundamentally hard and involve a lot of difficult tradeoffs. But somehow we have convinced ourselves as a profession that distributed systems are somehow easier.
Or you can stitch together eight different cloud services and let someone else debug that crap in prod. Not to mention subpar performance and an astronomical cloud bill.
What? That has nothing whatsoever to do with tautology. It's just a statement you agree with. If everyone else agreed with it, it might at most be a truism or an uninteresting statement, but evidently they do not. (They might claim to, but reality shows they optimise for other things - in my experience the simplest work, which does not always mean the simplest code, especially when you're accustomed to the mystic rituals of the Javanese tribes.)
- It takes 3 processes to send sound from Firefox to speakers (pipewire-pulse to accept PulseAudio streams from Firefox, pipewire to send audio to speakers, and wireplumber to detect speakers, expose them to pipewire, and route apps to the default audio device).
- pipewire and pipewire-pulse's functionality is solely composed of plugins (SPA) specified in a config file and glued together by an event loop calling functions, which call other functions through dynamic dispatch through C macros (#define spa_...). This makes reading the source less than helpful to understand control flow, and since the source code is essentially undocumented, I've resorted to breakpointing pipewire in gdb to observe its dynamic behavior (oddly I can breakpoint file:line but not the names of static TU-local functions). In fact I've heard you can run both services in a single daemon by merging their config files, though I haven't tried.
- wireplumber's functionality is driven by a Lua interpreter (its design was driven by the complex demands of automotive audio routing, which is overkill on desktops and makes stack traces less than helpful when debugging infinite-loop bugs).
- Apps are identified by numeric ids, and PipeWire (struct pw_map, not to be confused with struct spa_dict) immediately reuses the IDs of closed apps. Until recently rapidly closing and reopening audio streams caused countless race conditions in pipewire, pipewire-pulse, wireplumber, and client apps like plasmashell's "apps playing audio" list. (I'm not actually sure how they resolved this bug, perhaps with a monotonic counter ID alongside reused IDs?)
I feel a good deal of this complexity is incidental (queues pushed to in one function and popped from synchronously on the same thread and event callback, perhaps there's a valid reason or it could be removed in refactoring; me and IDEs are worse at navigating around macro-based dynamic dispatch than C++ virtual functions; perhaps there's a way to get mixed Lua-C stacktraces from wireplumber). I think both the multi-process architecture and ID reuse could've been avoided without losing functionality. Building core functionality using a plugin system rather than a statically traceable main loop may have been part of the intrinsic complexity of building an arbitrarily-extensible audio daemon, but I would prefer a simpler architecture with constrained functionality, either replacing pipewire, or as a more learnable project (closer to jack2) alongside pipewire.
I think it’s rather that complexity naturally expands to fill up the available capacity (of complexity-handling ability). That is, unless conscious and continuous effort is spent to contain and reduce complexity, it will naturally grow up to the limit where it causes too much problems to be viable anymore (like a virus killing its host and thus preventing further spread). This, in turn, means that the software industry tends to continually live on the edge of maximum complexity its members can (barely) handle.
I disagree that this is something that "naturally" happens. A lot of this thread is about how adding complexity is either a deliberate choice made by software developers or just that the developer simply was never taught how to do it the simple way--both of which illustrate a gap in software development education. When the tutorial about How To Create a TODO App starts with "Step 1: Install Kubernetes", I'd argue we have an education problem.
I do favor simple code but some complexity/abstracion is needed to make it easier to understand
Obviously, getting that right makes subsequent developments much, much easier, but it's hardly a simple route to success.
Sounds good "on paper" - in fact, is tautologically true - but it's hard to find two people who agree on the definition of "simple". You say "not having thousands of classes with indirection", and I've definitely seen that over-design of class hierarchies create an untouchable mess, but I've seen designs in the other direction (one giant main routine with duplicated code instead of reusing code) that were defended as "simple".
But I argue a lot of complexity and bugs comes from poor/unclear/conflicting thinking. Especially when it crosses boundaries between multiple developers who had to modify it but didn't truly internalize that part/design of the code.
Microservices, craptons of cloud-only dependencies, no way to easily create environments, ORMs and tooling that wraps databases creating crazy schemas... The list goes on and on; if you're early, you could do a lot worse than create a sensible monolith in a monorepo that uses just postgres, redis, and nginx and deploys using 'git pull' ...
So, Spring Boot you mean?
oh yes ...
I always bang on to my junior staff that their job was known as "analyst programmer" for a reason. The analyst part matters probably even more than the programmer part. In large companies just discovering what needs to be coded is 90% of the job, (the securely coding it in the constraints of the enterprise the other 90% while the final 90% is marketing your solution internally)
Anyway .. yes
Yes, but that is quite massive dysfunction of those companies. Meaning, we can yell at analysts-programmers as much as we want, what really needs to be fixed is the process that makes finding out requirements so ridiculously hard.
And yes, I work in one of those companies, it very clearly is dysfunction.
I don't mean expert programmers, but at least being able to read basic pseudocode algorithms.
It's hard to describe a problem if you don't even understand any language.
But you still have to predict based on a two-sentence description in a JIRA ticket how many "story points" it's going to take with 95% accuracy a dozen times within the span of a single "sprint planning session" every two weeks.
And engineers multiplying their initial estimate by 3, the project manager then multiplying that by 3 and rounding it up to be ten times more than the initial estimate.
I can't help but think about Tobias Fünke. Especially with you banging on your junior staff.
(Also, notice it's 'bang on to' the staff, not 'bang on' them. That is, the staff are the indirect object; the thing which is being said - banged on about - is the direct object.)
Absolutely. The tech part is relatively easy. Deciding what to build, that's where the friction and magic happens.
Are senior staff also analysts? Why or why not?
Finally, someone said it
I've pondered both sides of this argument. On one hand if this move had been done earlier it might not have been as difficult a multi-year project. On the other hand, when I look at the Rails application in particular, it was coded SO poorly that I if it was just written better, initially, it wouldn't even need to be deconstructed at this point. Also, if the same engineers that wrote that Rails app tried to write a bunch of distributed, even-driven microservices instead of the Rails app, we would probably be in even worse shape. (ᵔ́∀ᵔ̀)
I mean, just start with a cleanup session and proceed from there. Work on bit at a time and don't get too far from a working system.
https://www.youtube.com/watch?v=y8OnoxKotPQ
Summary: really trivial sounding feature requests like displaying a single new field on a page can become extremely difficult to implement and worse, hard to explain to stakeholders why.
The head engineer (who had known the guy funding the thing since childhood) had no clue how node, stateless architecture, or asynchronous code worked. He had somehow figured out how to get access to one particular worker of a node instance, through some internal ID or something, and used that to make stateful exchanges where one worker in some microservice was pinned to another. Which goes against everything node and microservices is about.
I tried to talk some sense into them but they didn’t want to hear it. So for the last six months I just did my part and drained the big guy’s money like everyone else. I hate doing that - way more stressful than working your ass off.
[1] http://cryto.net/~joepie91/blog/2016/06/19/stop-using-jwt-fo...
Edit: I was actually writing this code tonight myself for a project instead of it already being baked into the platform framework because SSO is only available as an "enterprise" feature and it's $150 a month for static shared password authentication. So market forces incentivize diverging standards.
Also, signed and/or encrypted communication is usually easier to implement without involving JWTs.
Best thing to do in security is to not roll your own and instead use trusted libraries that have industry-reviewed sane defaults. One way to check: look at the issues and PRs in the public repo and see if security-focused issues are promptly addressed, especially including keeping docs up-to-date. Security professionals are pedantic (for good reason).
Can anyone cite a single real world example of a fully stateless system being run for the purpose of business? I ask this every time JWTs come up and no one can answer it.
As soon as you tap the database on a request for any reason, whether it's for authorization or anything else, you might as well kiss JWTs goodbye.
Then again, just don't use them anyway, because they have no benefit. Zero. Disagree? Prove it. I'm sure there's some infinitesimally small benefit if you could measure it, but the reality is that JWTs are saving you from an operation that databases and computers themselves are designed to be extremely good at.
Don't use JWTs. They're academic flim-flam being used to sell services like Auth0.
Deleted Comment
I see all this talk about authentication, and it's just literally never been a problem or concern for my company.
Their documentation [1] is exceptional - all the gotchas and reasons for practices are clearly explained and there’s a first class library to handle token validation for you. I even ended up using the same library to validate tokens from Google.
Perhaps not all vendors produce equally well written documentation but I think it’s a lot easier to get it right today than it was 5 years ago.
1. https://docs.microsoft.com/en-us/azure/active-directory/deve...
We're paying for the sins of that in Unix these days, the kernel attack surface is in-feasibly large to remediate to correctness anytime soon (if ever?).
JWT in particular has the weird quirks you need to know to prevent encryption swapping attacks, and I'm sure there's more traps I myself am not aware of. At this point I think security can be seen on the same plan as legal: assuming a random dev will be able to plan and navigate out all the issues by sheer common sense hasn't been a viable approach for long now.
The only question I have is around your point on monorepos - every monorepo I’ve seen has been a discoverability nightmare with bespoke configurations and archaic incantations (and sometimes installing Java!) necessary to even figure out what plugs in to what.
How do you reason about a mono repo with new eyeballs? Do you get read in on things from an existing engineer? I struggle to understand how they’d make the job of auditing a software stack easier, except for maybe 3rd party dependency versions if they’re pulled in and shared.
On the other hand... this is all work you'd have to do anyways with multiple repositories. In the multi-repo scenario, it's even tougher to coordinate the dev environment, ownership, and organization principles - but the work isn't immediately obvious on checkout, so people don't always consider it.
Regarding auditing, I have always found that having all the code in one place is tremendously useful in terms of discoverability! Want to know where that class comes from? Guaranteed if it's not third-party, you know where it is.
Not to minimize the pain of poorly-managed monorepos - it's not a one-size-fits-all solution, and can definitely go sideways if left untended.
1) It's easy to miss a repo, if you don't have a list of them all somewhere.
2) It's easy to get out of sync with what version of your software corresponds to what branch/tag in each repo.
I'd like to hear how others solve this. The way I've addressed this is I bake into the build pipeline some way to dump to a text file all the version control metadata I could ever want to re-build the software from scratch. Then this text file is further embedded into the software primary executable itself, in some platform-compatible manner. Then I make sure the support team has the tooling to identify it in a trivial manner, whether a token-auth curl call to retrieve it over a REST API, or what have you. This goes well beyond the version number the users see, and supports detailed per-client patching information for temporary client-specific branches until they can be merged back into main without exposing those hairy details into the version number.
While this works for me and no support teams have come to me yet with problems using this approach, it strikes me as inelegant and I'm for some reason dissatisfied with "it ain't broke so don't fix it".
That's what the `[dependencies] my-lib = "1.0"` was supposed to solve.
I need to put that down somewhere the order/matching branches.
Deleted Comment
You should use opaque tokens instead if you don't want the frontend or other services that have access to the token to read it.