This is a technical question really. With such deep layoffs, I think a lot of people expected that Twitter would fall apart as a service pretty soon. That hasn't happened, and the site and app, at least to me seems to be running perfectly fine. World Cup coverage has been totally normal.
Could some engineering folks chime in about how this is possible?
Back in the 1990s my wife worked at the ag school and they had a moment of panic when they realized they had no idea where the web server was. Turned out they had a tiny little HP PA/RISC machine in a closet covered in dust bunnies that had been running for two years without anybody thinking about it.
Last night I wanted to create a webhook and decided to use AWS Lambda. I have a few things in AWS including Lambda functions. I figured I'd look at my old ones as a reference for my new one, but I was shocked to realize I had things that had been running for five years without any intervention at all.
In both of my cases you have middling software and negligent management but the underlying hardware or services are reliable and high quality. It's not like the entrepreneur I knew who was always finding web hosting that was a lot cheaper than anyone else with the downside that every few months we had to move to another data center in a hurry.
A growing company is frequently changing. A company that launches new features is changing. A company trying to fix architecture is changing. The large work forces a lot of valley companies have is built around and justified by this growth/change.
The change that twitter will likely experience now is machine failure (3/1000 a day probably), hard drive expiration, potentially database promotions. Failures of cache machines.
Automation can drive a lot of these to very small workloads, but capacity management is a potentially existential crisis looming over all tech companies.
Then you get to the real problem that twitter faces. Political change, security change, and workforce rot.
Political/regulatory change poses a problem because it often requires changes to infrastructure. This creates the type of change that can result in failure.
Security change can be supply chain problems or bug reports. Maybe keys need to get rotated, new encryption added, software updated. All of these are change. All can result in failure, and potentially catastrophic failure.
Lastly, the largest existential problem is that the engineers left at twitter are likely not their best and many of them are probably coerced into staying due to H1B regulation. Now you run into a problem of attrition and replacing that attrition. When your good engineers leave (or are over worked), it's harder to hire good engineers. The difference between a good engineer and a bad engineer is their `complexity to result` ratio. Good engineers can create simple solutions, while bad engineers create complex solutions, even though both might produce the same end result.
Failure is also proportional to complexity and outage duration is most impacted by complexity.
No serious engineer likes complexity for the sake of complexity. This may only apply to juniors practicing RDD (Resume-Driven Development).
Although there are times when a simple solution is not obvious even to the seniors, but these are generally very rare cases.
However people things like moderation, sales, etc are a different issue. Degradation, if any, is likely to be in quality, not system crashes.
It's not easy to tell which people are really pulling their weight, particularly when you have people in operations who are doing things that are essential but not flashy and people in development and marketing who are doing things that are flashy but not essential.
When there are mass layoffs often the best people jump ship early figuring they'll have an easy time getting work elsewhere and an even easier time if they are the first to go. Some of the people who stay are the people who don't feel they have a choice.
I am not a fan of OKRs, stack ranking, and other practices that create arcane and "high stakes" processes for measuring value because a pathological narcissists core competence is to convince management that their glass is 70% full and that your glass is 30% empty.
Deleted Comment
Unlike your Volkswagen CEO with a dim view of software in the past, Elon M understands software and packet switching at millisecond resolution and has demonstrated experience debugging complexity at that scale equiping him to make informed and impactful decisions.
Less is more.
If you fire 80% of the operations staff you better fire 80% of the eng staff too.
As changes get made and released I expect to see some more outages.
Also good staff made the plans and laid the groundwork for resiliency. That work continues to provide benefits after they’ve gone.