Between the engineering staff and the warehouse workers, I wonder how long it will be until they have already fired everyone who ever would have been willing to work there.
Even with candidate pools of hundreds of thousands of H1-B engineers and tens of millions of illegal immigrant warehouse workers, there still comes a point where such a big company firing so many people so quickly exhausts all their options.
It reminds me of the Robot Chicken Sketch where Imperial Officers aboard the Death Star all pretend to be force choked to death by Darth Vader so they can avoid getting killed by lightsaber, then come back in under different names in different jobs. It's worse though for Amazon: nobody wants to come back.
It seems amazon itself is aware of this issue. The linked engadget article even mentions this:
> "The rate at which Amazon has burned through the American working-age populace led to another piece of internal research, obtained this summer by Recode, which cautioned that the company might “deplete the available labor supply in the US” in certain metro regions within a few years."
> It seems amazon itself is aware of this issue. The linked engadget article even mentions this:
>> "The rate at which Amazon has burned through the American working-age populace led to another piece of internal research, obtained this summer by Recode, which cautioned that the company might “deplete the available labor supply in the US” in certain metro regions within a few years."
>“deplete the available labor supply in the US” in certain metro regions within a few years.
AWS Manager interpretation. "Few years?, not my problem"
So I'm guessing things will get worse. It took a long time but I remember when "cloud" started getting big lots of people voiced concern about being at the whim of Amazon/Bezos for your business critical infrastructure. Took longer than most people though but we are getting there.
Edit: I see its from 2022, so maybe it is the end stage?
There seems to be some sort imputation that, just because someone is on an H-1b, that they are not a good engineer.
I used to be on an H-1b and gladly came back home to India. I run my own business now. And yes. I'm ex-Anazon. It was a tough place to work, but circa mid-nineties, the stock options made it worth working for them.
I'm willing to bet I'll outcode a significant fraction of the audience on this site. And I'm not even close to the best developer around. Some of the smartest people I've met have been on an H-1b visa. Please consider not letting prejudice affect your view. You'll do yourself a disservice by underestimating your competition.
The only thing I implied is that workers with fewer rights that a U.S. citizen are easier to exploit and abuse.
If I refuse to take a work-related call at 3am, the worst that can happen is that I get fired, and spend months looking for a new job.
If you refuse to take a work-related call at 3am, you get fired and lose your ability to stay in a place you have lived for 5+ years and made your home.
That's BS, and I hate it almost as much as you do. You can be blackmailed with deportation, and I can be replaced with someone who can be blackmailed with deportation. We're both getting screwed in this current arrangement.
I'm interpreting this as mid-1990s, in which case I very much believe in your technical ability. My dad came over late-1990s and he worked at mid-sized companies ever since. Even then, he and his H1B peers were decently intelligent.
I would caution your defense of today's H1B/L1s/OPT workers; I'd say the quality of Indian engineers in the US has halved every 10 years.
Today's Indian engineers come to the US because they can't enroll in a decent college in India and/or obtain a upper-middle class salary from a job. It is an entirely different mechanism for which people are migrating over. It used to be brain drain, now it is sewage drain.
The H1Bs in the big tech companies are maybe 50/50 technically decent, but everywhere else, they are just taking contracting spots. It is a very corrupt and bloated system that has to go because they are not providing valuable work.
I think you’re not considering the other side of that perspective. I am sure you are very happy for your fortune to have been plucked out of India and been given the opportunity to work at Amazon and presumably live in America, which put you in the place you described that seems to be in a really good position today. The issue is that not only was the H1-B meant for highly specialized people that cannot be found in the USA, it has very long been absolutely abused by American corporations and politicians that have been betraying their own people for several decades now by engaging and ignoring this abuse that was really just about undermining salaries of Americans by giving the opportunity to you rather than Americans, while it was really mostly about enriching the rich. You were essentially just a method for the rich to get richer.
I am sure you are a wonderful person, but it’s simply an unjust treatment of Americans, even if you personally had nothing directly or reasonably to do with it. The betrayal and abuse was perpetrated by the “Americans” that led the corporations and paid off the politicians, and also the American citizens that were distracted and careless about their own politics and government and future for their own children. I doubt you would be ok with your own ruling class and rich to betray your children and the future of India, would you? It’s crazy, but America’s people largely and for a long time absolutely betrayed their own people.
I would not wish it on any society, even though it has been pervasive all over the “West”, where the rich, corporate captains, and politicians betray their own people. Imagine if your Indian politicians were to sell out India to the West or maybe import Africans or something similar, I would hope that the Indian people would make it absolutely clear to the politicians and rich that they are staring down a loaded gun and it’s not their finger on the trigger. So do I also wish it for the people of all of the western countries that they retake their sovereignty and self determination away from the rather parasitic oligarchy that has unconscionably been betraying its own people out of undeterred greed and crime against the very people that allowed making them rich and powerful in the first place.
It is not a personal thing, I think it’s just that people are recently getting a lot more angry about things because the American empire is hitting a rough patch that it has not experienced in anyone’s living memory and as it is said, (adapted) the naked people start getting angry when the tide goes out and there aren’t enough jobs to also be super generous by giving them away to Indians benefiting from the abusive systems of the parasitic cabal of the ruling class.
What you may also not be totally aware of, is that H1-B is only one of many different systems and programs that have been abused and quite literally benefit and profit foreigners overt Americans. Imagine if that existed in India; where I go to India, make 2-3x what the average Indian makes, the government gives me free housing, my children get free education and free healthcare, and I get extremely beneficial government secured loan terms on business loans and get grants to start a business and free consulting and services, and I get to bring dozens of my friends and family into India to work in my business, and I also get beneficial home loans to buy up houses and drive up prices, and my foreign children get preferential treatment in Indian universities (…while local Indians don’t get those things) and I run for office while all the foreigners I and my advantaged community brought over to India start getting our people into the government and we start taking over Indian institutions and government offices.
I combine and crossed things a bit because is a bit more complicated and nuances of course, and many Americans aren’t even aware of just how many programs and states are in place that advantage foreigners and disadvantage native Americans, who could even very well be the descendants of the founders of America. That’s why things have gotten rather tense and as it looks, unfortunately, it will likely only get worse from here; especially as BRICS builds out more of their alternative fiscal, monetary, economic, geopolitical structures; and the same traitors that control the USA will/are starting to get very nervous and borderline panicky. It seems Thucydides Trap is in full effect.
I like to think I'm halfway decent at my job, and I wouldn't work there once. During undergrad, my landlord working for AMZN on the opposite end of the country offered me an interview, but it was during final exam week.
I asked if I could schedule the interview after my final exams, and his arrogance really showed when not only did he refuse, but then insisted my exams are not don't even register on the same scale of importance as the opportunity to work for Amazon.
Somewhat related: a recruiter at Google cold-called me a couple months into my first job out of undergrad back in 2016 and was similarly condescending about "the chance" to work for Google compared to everything else. I already had a low opinion of them when they gave my then-girlfriend an introductory O'Reilly book on Java after she failed their interview.
I regret being born too late to work somewhere like Bell Labs, SGI, or Sun. I had a ton of graybeard wizard coworkers from these places, and they were all a pleasure to learn from and even better friends. For the first 2 years of my first job, every day of work was like walking into the Shire and talking magic spells with 20 Gandalfs.
That job was great until I got put on a team with a guy who was a former middle manager at some IBM-like company and went from being surrounded by people lightyears ahead of me to being surrounded by Dilbert characters. The messed-up part was that it wasn't even punishment. I was rewarded after completing a project with my choice of which team I joined next, and I joined the wrong one. I assumed that joining a new team to utilize this newfangled "cloud computing" thing would be trailblazing, and I didn't do any diligence on who I would work with.
To this day, I still regret not rejoining the first team I worked for, otherwise I would still be at that company and happy about it. Then again, the boredom and discontent while being on that sucky team is the reason I started investing, and now I can buy a house in cash and fund myself to do whatever I want for at least a decade. Hard to complain about the way things turned out.
Hi, I am a half decent engineer. I say that as objectively as one can say something like this about themselves.
I worked at Amazon. Twice. In total about a decade as a Principal Engineer. I left voluntarily a few months ago.
I have zero regrets about my time at Amazon. I learned lots, worked with some incredible people, and had fun doing it.
And the culture? It was life changing for me, especially when I first joined. In all the best ways.
And Amazon today? All I’ll say is that at their size, maintaining solid culture is damn hard. The hiring spree peri-Covid definitely added unimaginable stress to maintaining the culture the company was built on.
They’re a big company, and thus a big target. It’s easy, cheap, and even lazy to kick them with stuff like this.
The truth is that while it’s changed a lot over time, anyone fortunate enough to work there should embrace it.
We do a lot of business with them and we have workshops with them sometimes, and the one thing I notice is how they're all so evangelical. They wouldn't say a bad thing about their company. I couldn't be like that, when I work something I sell my knowledge but not my soul. I'd always speak freely (not always appreciated but usually it's not a problem if it's true).
But that company culture leaves me with a very low opinion of them and very little trust. Even Microsoft engineers are less brainwashed. I've had several that just told me the truth about services.
Maybe it depends on the country but it feels like this is just their culture.
I know some people who are fine working there. No one seems thrilled but if you're an above average engineer who is just getting by at 140k a year and suddenly you're looking at 350k a year as an SDEIII or something, that can be a life changing amount of money.
However, I think the question is, what percentage of engineers can pass the amazon interview but not the Apple/Databricks/Uber/Google/Meta ones. Because no one is picking amazon over the aforementioned companies.
However, maybe there's an opening at Amazon and not the other companies, or maybe that's your only offer. I certainly think it might be worth it for a a few years.
I've worked there since 2015, and this simply isn't true.
There's a lot wrong with AWS (and it's got a lot worse in the last 3 years), but there's also a lot right, and there are some really, really smart people there, several of which have boomeranged (people who left and came back).
The formula is usually more money and ability to work special team isolated from the usual toxic orgs. I think A9 was probably somewhat like that, and AWS probably used to be at some point long ago
Yep, everyone is excited until they leave after 1 1/2 or 2 years. There are always outliers but my personal experience is that the churn rate is incredible high.
Every single engineer I know who went to Amazon except one lasted under 3 years and to this day, often ten+ years later, they all will mention how much they hated it.
The one exception is an engineer who stopped engineering, switched into product, and transferred to China to hit on the women there.
Some Amazon practices actually sound great to me (short documents, read before the meeting) but so many things just sound needlessly, relentlessly cheap.
I was on one of the core AWS teams. I lasted 3 years and 1 month, to your point hah. I left about a year ago. My stress levels were through the roof during the time I was there. It truly was one of the most toxic stressful places I've ever worked, second only to Intel.
The largest contributor of stress being on-call rotations where getting paged between 12am-6am each night was basically a guarantee. God help you if it was a holiday and you got a high sev page, where the people that you really need are all out of pocket. The many many many instances of their security "regime" relentlessly paging us in the middle of the night for things like having an S3 bucket for static website assets; despite numerous exceptions given by L7+ leadership.
I disagree with the notion around "short documents", not only were they quite lengthy at times, but they actually made the process of "busywork" worse by adding more overhead to trivial matters.
Add on the layoffs and "return to office" horse-shit excuses and it's no wonder nobody wants to go back.
I know of 1 tech person at Amazon that claims to have liked it there; the husband of a co-worker (albeit 6 years ago). He was some in-house consultant type role though and few all over the world to help the internal teams straighten out whatever AWS mess they'd gotten into, so that's not quite the role that people think about when talking in a FAANG context.
I got a job at AWS/EFS from a post here on hacker news. Stayed there almost 2 years until RTO took its toll (left early 2024). If not for that, I'd still be there... and I went in with full knowledge of all the horror stories. Perhaps the EFS org was just a diamond in the rough, but it was honestly one of the best jobs I've had. Even the on call wasn't so bad, with management taking an extremely hands-on and proactive approach to reducing operational burden. Extremely high technical bar which taught me a ton about building and operating large distributed systems. I do wonder if EFS is still run so well.
I've since been at Oracle/OCI (absolute dog shit with the worst on call I've ever seen, and I've been in the military lol), and now at Microsoft/Azure, which so far seems like a decent workplace.
Are there really many illegal aliens working in the warehouses? I know that Amazon does verify employment eligibility and checks documents. There may be some committing identity theft but I doubt that it's a large proportion.
I would never work for AWS, given what I've heard, and consistently, of their internal culture.
Also, everything I've seen while working with internal staff makes me feel there's a culture of obfuscating all weaknesses from customers, practically to the point of deceit.
I still occasionally get them even though I literally was one of the people who left after they tried to make us go in office (I don't like to use "RTO" because no one on my team had actually worked out of an office for Amazon before since the project we were on was fairly new). My wife (at the time fiancee) has an autoimmune issue that makes it much safer for me not to commute, and although my manager suggested I could get an exemption, he didn't actually know what the process was because it all happened so quickly that no one seemed to have actually defined what that process was up front. I had a little less than a month to figure what to do and get that exemption before they expected me to either have that exemption, be in an office in another city three days a week, or transfer to a local team and be in the office in my current city three days a week. I ended to deciding that it wasn't worth the effort to try to figure out how to convince them to let me stay.
From what folks at AWS tell me that’s basically already happened. The best and brightest won’t even apply to work there anymore. For many key functions they’ve legit run out of people to recruit and thus have to go down market compared to competitors. Thats very much true in hot sectors like AI where AWS has “C” and at best “B” team players and leadership.
I’ve also heard key pieces of their infrastructure has sloppily written code and that communication between teams is horrible. Even with their insane salary offers, most people don’t think it’s worth it. Especially given their 6% “unregretted attrition”[1]
The claim is not that Amazon would be using illegal warehouse workers today, but that there is theoretically a pool of tens of millions of people available. Which is still kind of dubious.
Not totally on topic, but I recently passed the tipping point with Amazon shopping. I now go to Temu. They have US warehouses shipping in a couple days which was the only thing keeping me on Amazon. Plus everything on Amazon is basically the same stuff on Temu but with a markup!
Amazon’s not the only company on the planet that pays well. While they’re above average they’re far from top of the market. If you’re talented enough one can make a lot of money and skip the toxic culture. Double win.
Even L6 managers feel this, but it becomes more clear as one goes up in levels. Recruiting is job one.
If Amazon runs out of recruitable engineers (unlikely, they are one of the most prestigious firms in the world) then they will simply lower the bar. HC must be filled.
I seem to remember an internal memo being leaked in which middle management was complaining that they would eventually/soon burn through all the laborers in that region, after which they'd suffer immense difficulty in staffing that warehouse.
There was another comment that pointed out how unlikely this is to happen because Amazon is just too big to bring the law down on now.
When you're the, what. Second? Third? Largest employer in the US, enforcing the law now becomes a meaningful hit to economic velocity. And as much as Trump hates brown people, his administration has begrudgingly revealed that there are moves that his billionaire buddies Will Not Allow.
I'm no fan of ice or this administrations deportation strategy, but it's a serious problem that even enforcing the law on Amazon is now an economic liability so much that nobody dares to try
Given their self-imposed labor supply issues, notably the awful working conditions, I hope the workforce there figures out how to effectively organize against this cesspool of a corporation.
Look, not to defend anything Amazon is doing, but this causal chain seems rather pareidolic and under-evidenced. You could spin some kind of crazy narrative about any major outage based on changes in policy that happened just before. But this isn't nearly the first AWS outage, and most of them happened before the recent RTO changes. It needs more evidence at best.
The article wasn't about the outage happening, it was about the amount of time it took to even discover what the problem was. Seems logical to assume that could be because there aren't many people left who know how all the systems connect.
> Seems logical to assume that could be because there aren't many people left who know how all the systems connect.
It's only logical presupposing a lot of other conditions, each of which is worthy of healthy skepticism. And even then, it's only a hypothesis. You need evidence to go from "this could have contributed to the problem" to "this caused the problem."
Based on what little is given in the article, it seems to go strongly against this hypothesis. For example it links to multiple past findings that Amazon's notification times need improvement going back to 2017. If something has been a problem for nearly a decade, it's hard to imagine it is a result of any recent personnel changes.
TFA does not establish how many AWS workers have left or been laid off, nonetheless how many of those were actually undesirable losses of highly skilled individuals. Even if we take it on faith that a large number of such individuals were lost, it is another bridge further to claim that there was neither redundancy in that skillset which remained, nor that any vacancies have been left unfilled since.
No evidence is given that indicates that if a more experienced team were working on the problem it would have been identified and resolved faster. The article even states something to the opposite effect:
> AWS is very, very good at infrastructure. You can tell this is a true statement by the fact that a single one of their 38 regions going down (albeit a very important region!) causes this kind of attention, as opposed to it being "just another Monday outage." At AWS's scale, all of their issues are complex; this isn't going to be a simple issue that someone should have caught, just because they've already hit similar issues years ago and ironed out the kinks in their resilience story.
Indeed, the article doesn't even provide evidence that the response was unreasonably slow. No comparison to similar outages either from AWS in the past, before the hypothecated brain drain, nor from competitors. Note that the author has no idea what the problem actually was, or what AWS had to do to diagnose the issue.
Twice I've had to deal with outages where the root cause took a long time to find because there were several distinct root causes interacting in ways that made it difficult or impossible to reproduce the problem in an isolated way, or to even reason about the problem until we started figuring out that there were multiple unrelated root causes. All other outages I've dealt with were the source where experienced engineers and institutional knowledge were sufficient to quickly find the cause and fix it.
Which is to say: it's entirely possible that the inferences drawn by TFA are just wrong. And it's also possible that TFA is wrong but also right to express concern with how Amazon manages talent.
It's about the time between the announcements about finding the cause. I find that to be thin evidence. There are far too many alternate explanations. It's not even that I find the idea to be implausible, but I don't think the article's doom-saying confidence level is warranted.
Indeed. No disrespect to Justin (great person) or any of the engineers who were sacked but Corey's post here is basically "here's someone who was sacked, and here are several other layoff news". AWS is really big organization. Several orders of magnitude bigger than people who were remote/refused to RTO. Organizations like this survive these brain brains.
Internal documents reportedly say that Amazon suffers from 69 percent to 81 percent regretted attrition across all employment levels. In other words, "people quitting who we wish didn't."
> Organizations like this survive these brain brains.
True, that's the other thing. Even if it's true that brain drain directly caused/exacerbated this event, big companies have a lot of momentum. Money can paper over a terrifying range and magnitude of folly. Amazon won't die quickly.
This is the time to accept that the path forward is keeping people and giving them the best tools you possibly can to do their work. That is, the same as has been true for decades remains so.
Yes, development tools are better every day. Yes, you can downsize. No it won’t be felt immediately. Yes, it mortgages the future and at a painfully high interest rate.
Suspending disbelief won’t make downsizing work better.
Seems like it worked fine. They laid off a quarter of their junior principal engineers, the stock went up. They had a massive outage a few months later, the stock went up again. Everything's working out fine for their strategy so far.
I remember comments saying the stock went up because the average joe didn't realize how much of the internet was powered by AWS until all their day to day apps started failing. To most people Amazon is an online shopping site.
You would think this would eventually show up on the balance sheets, right? Presumably a lot of their big customers have SLAs with money penalties, so maybe next quarter earnings? Or quarter after that?
Where are the young companies trying to replace them? There are all the AI companies, but Google and Meta both have competitive chatbots, and OpenAI is signing weird deals that don't make it look like a long-term player.
I don't think there's ignorance of the fact that turnover is bad, I think the field is being designed to homogenize staff and favor uniform mediocrity so that employees truly do become interchangeable. We're so close to just plain talent being likened to cowboyism.
I wish to understand the virtue of Amazon culture.
It seems that at L6 and below workers are a Taylorism-style fungible widget driven to convert salary into work product, guided to create the most output for the longest time before mentally breaking down, then being swiftly replaced, with L7 and above being so incredibly political that keeping the snakes and vultures from eating your team is a full time job at every level of senior management.
It never made sense to me how such a ruthless and inhumane culture is sustainable in the long run.
I would love to hear positive counter perspectives from Amazonians because the anecdotes from my L6-L10 friends describe what sounds like an inhumane hell on earth.
> It never made sense to me how such a ruthless and inhumane culture is sustainable in the long run.
It’s pretty simple, actually. Once such a dominant market position is achieved, you can get away with almost anything, whether with customers or employees. This is true of all the BigTech companies.
I think there's more to it. When you're dominant, you make money whatever. Think of Amazon et al. as huge spigots of money. Now, it becomes optimal to fight for more of that money coming your way. It's like the resource curse for countries. Nobody gains from growing the pie; they gain from stealing the pie. At some point, parasites and parasitic behaviours invade.
> It never made sense to me how such a ruthless and inhumane culture is sustainable in the long run.
It doesn't need to be sustainable in the long run: just needs to get to the next quarter and there continues to be enough desperate people in the US or India willing to be ground up in the machine for a chance to buy a house in a major metro
(Source: I was at Amazon for 10 years, finally quit last month)
I think it comes down to demand and supply for jobs.
The only time Amazon was forced to change its ways was during Covid hiring boom where they couldn't compete in the talent market. They were forced to increases their salary bands and the culture was also a bit easy during that time. But starting mid 2022 it's been an employer's market and Amazon is making sure to juice every bit out of its employees while it can
It's not as conscious as that, its an emergent outcome of the snake pit.
Engineers have to spend an inordinate amount of time on "managing up", which means they have very little time and attention to do what would otherwise be a reasonable workload. Additionally, good engineers hate and despise this so it contributes a lot to the burnout.
It was Diwali vacation in India. It looks like the managers were not able to force everyone to walk around with their laptops and pagers hanging from their necks and waists, respectively, which they normally do.
If there's one thing I have learned from my Amazon mates, then that is they never have a true time off. Hills, beaches, a marriage in the family— no exceptions. It's so pervasive that I can't really imagine it to be voluntary, and my friends' answers on this topic have never been clear.
It was certainly suspicious that actual progress on the outage seemed to start right around U.S. west coast start of day. Updates before that were largely generic "we're monitoring and mitigating" with nothing of substance.
[09:13 AM PDT] We have taken additional mitigation steps to aid the recovery of the underlying internal subsystem responsible for monitoring the health of our network load balancers and are now seeing connectivity and API recovery for AWS services. We have also identified and are applying next steps to mitigate throttling of new EC2 instance launches. We will provide an update by 10:00 AM PDT.
[08:43 AM PDT] We have narrowed down the source of the network connectivity issues that impacted AWS Services...
[08:04 AM PDT] We continue to investigate the root cause for the network connectivity issues...
[12:11 AM PDT] <declared outage>
They claim not to have known the root cause for ~8hr
I noticed that too. I think tech culture has to change a bit. Silicon Valley is a great location if you're making hardware or prepackaged software. If you have to support a real economy that is mostly on the East Coast you need a presence there.
> one really gets the sense that it took them 75 minutes to go from "things are breaking" to "we've narrowed it down to a single service endpoint, but are still researching," which is something of a bitter pill to swallow
Is 75 minutes really considered that long of a time? I don't do my day-job in webdev, so maybe I'm just naive. But being able to diagnose the single service endpoint in 75 minutes seems pretty good to me. When I worked on firmware we frequently spent _weeks_ trying to diagnose what part of the firmware was broken.
> Is 75 minutes really considered that long of a time? [...] When I worked on firmware we frequently spent _weeks_ trying to diagnose what part of the firmware was broken.
One might spend weeks diagnosing a problem if the problem only happens 0.01% of the time, correlated with nothing, goes away when retried, and nobody can reproduce it in a test environment.
But 0.01%-and-it-goes-away-when-retried does not make a high priority incident. High priority incidents tend to be repeatable problems that weren't there an hour ago.
Generally a well designed, properly resourced business critical system will be simple enough and well enough monitored that problems can be diagnosed in a good deal less than 75 minutes - even if rolling out a full fix takes longer.
Of course, I don't know how common well designed, properly resourced business critical systems are.
A few years back I was working at a software company that provided on-site sensor sensor networks to hospitals, pharmacies, etc. Our product required them to physically install a server on-site, but we were starting to get disrupted by cloud-based solutions. Essentially what we did was alert medical staff when blood, organs, etc. refrigeration temperatures went out of range. If the right people involved did not get notifications on time for these issues people will die. Its not hyperbole, you have to wait years for liver transplant. Their aren't just new livers available for everyone if a handful of them spoil.
With that being said, the problem here isn't that it took 75 minutes to find the root cause, but rather that the fix took hours to propagate through the us-east-1 data center network. Which is completely unacceptable for industries like healthcare where even small disruptions are a matter of life and death.
>Is 75 minutes really considered that long of a time?
From my experience in setting up and running support services, not really. It's actually pretty darn quick.
First, the issue is reported to level 1 support, which is bunch of juniors/drones on call, often offshore (depending on time of the day) who'll run through their scripts and having determined that it's not in there, escalate to level 2.
Level 2 would be more experienced developer/support tech, who's seen a thing or two and dealt with serious issues. It will take time to get them online as they're on call but not online at 3am EST, as they have to get their cup of joe, turn on the laptop etc. Would take them a bit to realize that the fecal matter made contact with the rotating blades and escalate to level 3.
Which involves setting up the bridge, waking up the decisions makers (in my case it was director and VP level), and finally waking up the guy who either a) wrote all this or b) is one of 5 or 6 people on the planet capable of understanding and troubleshooting the tangled mess.
I do realize that AWS support might be structured quite a bit differently, but still... 75 minutes is pretty good.
Edit: That is not to say that AWS doesn't have a problem with turnover. I'm well aware of their policies and tendency to get rid of people in 2/3 years, partially due to compensation structures where there's a significant bump in compensation - and vesting - once you reach that timeframe.
But in this particular case I don't think support should take much of a blame. The overall architecture on the other hand...
Sorry, are you saying you worked at Amazon and this is how they handle major outages? Just snooze and wait for a ticket to make its way up from end user support? No monitoring? No global time zone coverage?
Because if so, this seems like about the most damning thing I could learn from this incident.
Wholly inaccurate. AWS Systems Engineers would have been paged by automated monitoring systems once alert thresholds were breached. No escalation through Support needed.
Quite a few of AWS's more mature customers (including my company) were aware within 15 minutes of the incident that Dynamo was failing and hypothesized that it'd taken other services. Hopefully AWS engineers were at least fast.
75 minutes to make a decision about how to message that outage is not particularly slow though, and my guess is that this is where most of the latency actually came from.
The web operates in a very different world if you've invested in good tooling. I used to be lead on a modestly sized payment processing back end to the tune of about 100 transactions/second (we were essentially Stripe for the client facing apps at the company). In many cases our monitoring and telemetry let us identify root cause in a matter of minutes. Not saying that is or should be the norm for all web apps, but what we had was not too far off from a read-only debugger view of the back end app's state throughout the request and it was very powerful. Of course for us more often than not the root cause was "the bank we depend on is having a problem" so our knowledge couldn't do much other than help the company shape customer communications about the incident.
Also it's pretty likely it took less time than that to get an idea, but generally for public updates you want to be very reserved, otherwise users get the wrong impressions.
For a service like AWS, 75 mins is going to result in a LOT of COE's for people on way it wasn't mitigated quicker. A Sev 1 like this has an SLA of 20 mins to mitigate impact. Writing about these failures will consume a dozen peoples time for the next 6 weeks.
I have 10 years of experience at Amazon as an L6/L7 SDM, across 4 teams (Games, logistics, Alexa, Prime video). I have also been on a team that caused a sev 1 in the past.
Amazon is supposed to have the best infrastructure in the business because everyone else runs on it. They should have access to the sre talent that can quickly mitigate this kind of issue
It's 75 minutes to _communicate_ the message to customers. Definitely internal teams were ahead of this before it was posted to the AWS Health Dashboard. Status Page posts are lagging indicators of incident progress.
I work in an incident management team where the turnaround from "we've decided to take x action, to y metric shows it is working, to z is posted on the status page" can be 1-2 minutes.
It is possible with professionals, institutional knowledge, drills, and good tools.
Tech will learn like manufacturing folks did that experience is not fungible. You can try to replace someone, but the new guy also needs to accumulate the scars from the system for years before taking over.
You cannot just keep abstracting and chopping systems to smaller and smaller subsystems to make them easy to digest.
At some point someone needs to know how these coordinate and behave under disturbances. At some point someone needs to know at a low level what the hell is going on.
I don't know, manufacturing seems to have learned pretty well that they can ship everything overseas and people will eventually accept products just aren't made the same way they used to be.
If AI is to tech what outsourcing was to manufacturing, then your analogy has me concerned for the future.
Good point. They can start offering 95% availability for services, initially for a better price. Then just bring the market expectation to 95% availability and raise prices.
> I don't know, manufacturing seems to have learned pretty well that they can ship everything overseas and people will eventually accept products just aren't made the same way they used to be.
I mean this is true, but we aren't talking about the consumer here. We're talking about the industry which is to say the powerful people who own and run all these companies.
What has happened is that those overseas countries now have all the experienced engineers over there and they know it. So you see things like the Trump admin begging Korean companies to keep their workers in the US because they understand how to actually do these things. And the reason the Trump admin did that is because they owe favors to the rich people who want to profit off of factories in the US.
Even with candidate pools of hundreds of thousands of H1-B engineers and tens of millions of illegal immigrant warehouse workers, there still comes a point where such a big company firing so many people so quickly exhausts all their options.
It reminds me of the Robot Chicken Sketch where Imperial Officers aboard the Death Star all pretend to be force choked to death by Darth Vader so they can avoid getting killed by lightsaber, then come back in under different names in different jobs. It's worse though for Amazon: nobody wants to come back.
https://www.youtube.com/watch?v=fFihTRIxCkg
> "The rate at which Amazon has burned through the American working-age populace led to another piece of internal research, obtained this summer by Recode, which cautioned that the company might “deplete the available labor supply in the US” in certain metro regions within a few years."
>> "The rate at which Amazon has burned through the American working-age populace led to another piece of internal research, obtained this summer by Recode, which cautioned that the company might “deplete the available labor supply in the US” in certain metro regions within a few years."
This article? https://www.engadget.com/amazon-attrition-leadership-ctsmd-2...
It's from 2022, so it'd be interesting to see an update.
AWS Manager interpretation. "Few years?, not my problem"
So I'm guessing things will get worse. It took a long time but I remember when "cloud" started getting big lots of people voiced concern about being at the whim of Amazon/Bezos for your business critical infrastructure. Took longer than most people though but we are getting there.
Edit: I see its from 2022, so maybe it is the end stage?
Deleted Comment
I used to be on an H-1b and gladly came back home to India. I run my own business now. And yes. I'm ex-Anazon. It was a tough place to work, but circa mid-nineties, the stock options made it worth working for them.
I'm willing to bet I'll outcode a significant fraction of the audience on this site. And I'm not even close to the best developer around. Some of the smartest people I've met have been on an H-1b visa. Please consider not letting prejudice affect your view. You'll do yourself a disservice by underestimating your competition.
The only thing I implied is that workers with fewer rights that a U.S. citizen are easier to exploit and abuse.
If I refuse to take a work-related call at 3am, the worst that can happen is that I get fired, and spend months looking for a new job.
If you refuse to take a work-related call at 3am, you get fired and lose your ability to stay in a place you have lived for 5+ years and made your home.
That's BS, and I hate it almost as much as you do. You can be blackmailed with deportation, and I can be replaced with someone who can be blackmailed with deportation. We're both getting screwed in this current arrangement.
I'm interpreting this as mid-1990s, in which case I very much believe in your technical ability. My dad came over late-1990s and he worked at mid-sized companies ever since. Even then, he and his H1B peers were decently intelligent.
I would caution your defense of today's H1B/L1s/OPT workers; I'd say the quality of Indian engineers in the US has halved every 10 years.
Today's Indian engineers come to the US because they can't enroll in a decent college in India and/or obtain a upper-middle class salary from a job. It is an entirely different mechanism for which people are migrating over. It used to be brain drain, now it is sewage drain.
The H1Bs in the big tech companies are maybe 50/50 technically decent, but everywhere else, they are just taking contracting spots. It is a very corrupt and bloated system that has to go because they are not providing valuable work.
I fail to reach this interpretation in this thread.
I am sure you are a wonderful person, but it’s simply an unjust treatment of Americans, even if you personally had nothing directly or reasonably to do with it. The betrayal and abuse was perpetrated by the “Americans” that led the corporations and paid off the politicians, and also the American citizens that were distracted and careless about their own politics and government and future for their own children. I doubt you would be ok with your own ruling class and rich to betray your children and the future of India, would you? It’s crazy, but America’s people largely and for a long time absolutely betrayed their own people.
I would not wish it on any society, even though it has been pervasive all over the “West”, where the rich, corporate captains, and politicians betray their own people. Imagine if your Indian politicians were to sell out India to the West or maybe import Africans or something similar, I would hope that the Indian people would make it absolutely clear to the politicians and rich that they are staring down a loaded gun and it’s not their finger on the trigger. So do I also wish it for the people of all of the western countries that they retake their sovereignty and self determination away from the rather parasitic oligarchy that has unconscionably been betraying its own people out of undeterred greed and crime against the very people that allowed making them rich and powerful in the first place.
It is not a personal thing, I think it’s just that people are recently getting a lot more angry about things because the American empire is hitting a rough patch that it has not experienced in anyone’s living memory and as it is said, (adapted) the naked people start getting angry when the tide goes out and there aren’t enough jobs to also be super generous by giving them away to Indians benefiting from the abusive systems of the parasitic cabal of the ruling class.
What you may also not be totally aware of, is that H1-B is only one of many different systems and programs that have been abused and quite literally benefit and profit foreigners overt Americans. Imagine if that existed in India; where I go to India, make 2-3x what the average Indian makes, the government gives me free housing, my children get free education and free healthcare, and I get extremely beneficial government secured loan terms on business loans and get grants to start a business and free consulting and services, and I get to bring dozens of my friends and family into India to work in my business, and I also get beneficial home loans to buy up houses and drive up prices, and my foreign children get preferential treatment in Indian universities (…while local Indians don’t get those things) and I run for office while all the foreigners I and my advantaged community brought over to India start getting our people into the government and we start taking over Indian institutions and government offices.
I combine and crossed things a bit because is a bit more complicated and nuances of course, and many Americans aren’t even aware of just how many programs and states are in place that advantage foreigners and disadvantage native Americans, who could even very well be the descendants of the founders of America. That’s why things have gotten rather tense and as it looks, unfortunately, it will likely only get worse from here; especially as BRICS builds out more of their alternative fiscal, monetary, economic, geopolitical structures; and the same traitors that control the USA will/are starting to get very nervous and borderline panicky. It seems Thucydides Trap is in full effect.
I asked if I could schedule the interview after my final exams, and his arrogance really showed when not only did he refuse, but then insisted my exams are not don't even register on the same scale of importance as the opportunity to work for Amazon.
Somewhat related: a recruiter at Google cold-called me a couple months into my first job out of undergrad back in 2016 and was similarly condescending about "the chance" to work for Google compared to everything else. I already had a low opinion of them when they gave my then-girlfriend an introductory O'Reilly book on Java after she failed their interview.
I regret being born too late to work somewhere like Bell Labs, SGI, or Sun. I had a ton of graybeard wizard coworkers from these places, and they were all a pleasure to learn from and even better friends. For the first 2 years of my first job, every day of work was like walking into the Shire and talking magic spells with 20 Gandalfs.
That job was great until I got put on a team with a guy who was a former middle manager at some IBM-like company and went from being surrounded by people lightyears ahead of me to being surrounded by Dilbert characters. The messed-up part was that it wasn't even punishment. I was rewarded after completing a project with my choice of which team I joined next, and I joined the wrong one. I assumed that joining a new team to utilize this newfangled "cloud computing" thing would be trailblazing, and I didn't do any diligence on who I would work with.
To this day, I still regret not rejoining the first team I worked for, otherwise I would still be at that company and happy about it. Then again, the boredom and discontent while being on that sucky team is the reason I started investing, and now I can buy a house in cash and fund myself to do whatever I want for at least a decade. Hard to complain about the way things turned out.
I worked at Amazon. Twice. In total about a decade as a Principal Engineer. I left voluntarily a few months ago.
I have zero regrets about my time at Amazon. I learned lots, worked with some incredible people, and had fun doing it.
And the culture? It was life changing for me, especially when I first joined. In all the best ways.
And Amazon today? All I’ll say is that at their size, maintaining solid culture is damn hard. The hiring spree peri-Covid definitely added unimaginable stress to maintaining the culture the company was built on.
They’re a big company, and thus a big target. It’s easy, cheap, and even lazy to kick them with stuff like this.
The truth is that while it’s changed a lot over time, anyone fortunate enough to work there should embrace it.
But that company culture leaves me with a very low opinion of them and very little trust. Even Microsoft engineers are less brainwashed. I've had several that just told me the truth about services.
Maybe it depends on the country but it feels like this is just their culture.
However, I think the question is, what percentage of engineers can pass the amazon interview but not the Apple/Databricks/Uber/Google/Meta ones. Because no one is picking amazon over the aforementioned companies.
However, maybe there's an opening at Amazon and not the other companies, or maybe that's your only offer. I certainly think it might be worth it for a a few years.
There's a lot wrong with AWS (and it's got a lot worse in the last 3 years), but there's also a lot right, and there are some really, really smart people there, several of which have boomeranged (people who left and came back).
The stories I hear there is just not the style of work I'm interested in
They raised worker pay which, unsurprisingly, did not make Wall Street happy in the short term.
However, over the next couple years there were multiple benefits:
- lower turnover
- less employee theft
- cleaner stores
- more same store sales
etc
Deleted Comment
The one exception is an engineer who stopped engineering, switched into product, and transferred to China to hit on the women there.
Some Amazon practices actually sound great to me (short documents, read before the meeting) but so many things just sound needlessly, relentlessly cheap.
The largest contributor of stress being on-call rotations where getting paged between 12am-6am each night was basically a guarantee. God help you if it was a holiday and you got a high sev page, where the people that you really need are all out of pocket. The many many many instances of their security "regime" relentlessly paging us in the middle of the night for things like having an S3 bucket for static website assets; despite numerous exceptions given by L7+ leadership.
I disagree with the notion around "short documents", not only were they quite lengthy at times, but they actually made the process of "busywork" worse by adding more overhead to trivial matters.
Add on the layoffs and "return to office" horse-shit excuses and it's no wonder nobody wants to go back.
I've since been at Oracle/OCI (absolute dog shit with the worst on call I've ever seen, and I've been in the military lol), and now at Microsoft/Azure, which so far seems like a decent workplace.
Deleted Comment
I would never work for AWS, given what I've heard, and consistently, of their internal culture.
Also, everything I've seen while working with internal staff makes me feel there's a culture of obfuscating all weaknesses from customers, practically to the point of deceit.
[1] https://fortune.com/2024/03/20/amazon-layoffs-performance-re...
You have a source for that claim?
He's saying even if they had 10m illegal workers they would burn though them all too
I have a friend who recently started there. He just brought a Mercedes, and a second house, and he’s still in his early thirties.
They keep him busy, though.
Even L6 managers feel this, but it becomes more clear as one goes up in levels. Recruiting is job one.
If Amazon runs out of recruitable engineers (unlikely, they are one of the most prestigious firms in the world) then they will simply lower the bar. HC must be filled.
The company is well structured, it will survive.
When you're the, what. Second? Third? Largest employer in the US, enforcing the law now becomes a meaningful hit to economic velocity. And as much as Trump hates brown people, his administration has begrudgingly revealed that there are moves that his billionaire buddies Will Not Allow.
I'm no fan of ice or this administrations deportation strategy, but it's a serious problem that even enforcing the law on Amazon is now an economic liability so much that nobody dares to try
Sorry, what?
It's only logical presupposing a lot of other conditions, each of which is worthy of healthy skepticism. And even then, it's only a hypothesis. You need evidence to go from "this could have contributed to the problem" to "this caused the problem."
Based on what little is given in the article, it seems to go strongly against this hypothesis. For example it links to multiple past findings that Amazon's notification times need improvement going back to 2017. If something has been a problem for nearly a decade, it's hard to imagine it is a result of any recent personnel changes.
TFA does not establish how many AWS workers have left or been laid off, nonetheless how many of those were actually undesirable losses of highly skilled individuals. Even if we take it on faith that a large number of such individuals were lost, it is another bridge further to claim that there was neither redundancy in that skillset which remained, nor that any vacancies have been left unfilled since.
No evidence is given that indicates that if a more experienced team were working on the problem it would have been identified and resolved faster. The article even states something to the opposite effect:
> AWS is very, very good at infrastructure. You can tell this is a true statement by the fact that a single one of their 38 regions going down (albeit a very important region!) causes this kind of attention, as opposed to it being "just another Monday outage." At AWS's scale, all of their issues are complex; this isn't going to be a simple issue that someone should have caught, just because they've already hit similar issues years ago and ironed out the kinks in their resilience story.
Indeed, the article doesn't even provide evidence that the response was unreasonably slow. No comparison to similar outages either from AWS in the past, before the hypothecated brain drain, nor from competitors. Note that the author has no idea what the problem actually was, or what AWS had to do to diagnose the issue.
Which is to say: it's entirely possible that the inferences drawn by TFA are just wrong. And it's also possible that TFA is wrong but also right to express concern with how Amazon manages talent.
From TFA.
True, that's the other thing. Even if it's true that brain drain directly caused/exacerbated this event, big companies have a lot of momentum. Money can paper over a terrifying range and magnitude of folly. Amazon won't die quickly.
Yes, development tools are better every day. Yes, you can downsize. No it won’t be felt immediately. Yes, it mortgages the future and at a painfully high interest rate.
Suspending disbelief won’t make downsizing work better.
See: general electric, RCA, Xerox, GM
Deleted Comment
Are they?
It seems that at L6 and below workers are a Taylorism-style fungible widget driven to convert salary into work product, guided to create the most output for the longest time before mentally breaking down, then being swiftly replaced, with L7 and above being so incredibly political that keeping the snakes and vultures from eating your team is a full time job at every level of senior management.
It never made sense to me how such a ruthless and inhumane culture is sustainable in the long run.
I would love to hear positive counter perspectives from Amazonians because the anecdotes from my L6-L10 friends describe what sounds like an inhumane hell on earth.
It’s pretty simple, actually. Once such a dominant market position is achieved, you can get away with almost anything, whether with customers or employees. This is true of all the BigTech companies.
It doesn't need to be sustainable in the long run: just needs to get to the next quarter and there continues to be enough desperate people in the US or India willing to be ground up in the machine for a chance to buy a house in a major metro
(Source: I was at Amazon for 10 years, finally quit last month)
The only time Amazon was forced to change its ways was during Covid hiring boom where they couldn't compete in the talent market. They were forced to increases their salary bands and the culture was also a bit easy during that time. But starting mid 2022 it's been an employer's market and Amazon is making sure to juice every bit out of its employees while it can
Engineers have to spend an inordinate amount of time on "managing up", which means they have very little time and attention to do what would otherwise be a reasonable workload. Additionally, good engineers hate and despise this so it contributes a lot to the burnout.
If there's one thing I have learned from my Amazon mates, then that is they never have a true time off. Hills, beaches, a marriage in the family— no exceptions. It's so pervasive that I can't really imagine it to be voluntary, and my friends' answers on this topic have never been clear.
Maybe it was still at the end of Indian day but together with the holiday I'd say that makes it more unlikely to be handled there
[08:43 AM PDT] We have narrowed down the source of the network connectivity issues that impacted AWS Services...
[08:04 AM PDT] We continue to investigate the root cause for the network connectivity issues...
[12:11 AM PDT] <declared outage>
They claim not to have known the root cause for ~8hr
Dead Comment
Is 75 minutes really considered that long of a time? I don't do my day-job in webdev, so maybe I'm just naive. But being able to diagnose the single service endpoint in 75 minutes seems pretty good to me. When I worked on firmware we frequently spent _weeks_ trying to diagnose what part of the firmware was broken.
One might spend weeks diagnosing a problem if the problem only happens 0.01% of the time, correlated with nothing, goes away when retried, and nobody can reproduce it in a test environment.
But 0.01%-and-it-goes-away-when-retried does not make a high priority incident. High priority incidents tend to be repeatable problems that weren't there an hour ago.
Generally a well designed, properly resourced business critical system will be simple enough and well enough monitored that problems can be diagnosed in a good deal less than 75 minutes - even if rolling out a full fix takes longer.
Of course, I don't know how common well designed, properly resourced business critical systems are.
With that being said, the problem here isn't that it took 75 minutes to find the root cause, but rather that the fix took hours to propagate through the us-east-1 data center network. Which is completely unacceptable for industries like healthcare where even small disruptions are a matter of life and death.
From my experience in setting up and running support services, not really. It's actually pretty darn quick.
First, the issue is reported to level 1 support, which is bunch of juniors/drones on call, often offshore (depending on time of the day) who'll run through their scripts and having determined that it's not in there, escalate to level 2.
Level 2 would be more experienced developer/support tech, who's seen a thing or two and dealt with serious issues. It will take time to get them online as they're on call but not online at 3am EST, as they have to get their cup of joe, turn on the laptop etc. Would take them a bit to realize that the fecal matter made contact with the rotating blades and escalate to level 3.
Which involves setting up the bridge, waking up the decisions makers (in my case it was director and VP level), and finally waking up the guy who either a) wrote all this or b) is one of 5 or 6 people on the planet capable of understanding and troubleshooting the tangled mess.
I do realize that AWS support might be structured quite a bit differently, but still... 75 minutes is pretty good.
Edit: That is not to say that AWS doesn't have a problem with turnover. I'm well aware of their policies and tendency to get rid of people in 2/3 years, partially due to compensation structures where there's a significant bump in compensation - and vesting - once you reach that timeframe.
But in this particular case I don't think support should take much of a blame. The overall architecture on the other hand...
Because if so, this seems like about the most damning thing I could learn from this incident.
Alerts and monitoring will results in automatic pages to engineers. There is no human support before it gets escalated.
If an engineer hasn't taken a look within a few minutes, it escalates to their manager, and so on.
Quite a few of AWS's more mature customers (including my company) were aware within 15 minutes of the incident that Dynamo was failing and hypothesized that it'd taken other services. Hopefully AWS engineers were at least fast.
75 minutes to make a decision about how to message that outage is not particularly slow though, and my guess is that this is where most of the latency actually came from.
I have 10 years of experience at Amazon as an L6/L7 SDM, across 4 teams (Games, logistics, Alexa, Prime video). I have also been on a team that caused a sev 1 in the past.
Just capitalised for emphasis, right?
> COE
Center of Excellence? Council of Europe? Still wondering even after Googling.
> SLA
Service Level Agreement. This I knew beforehand.
> SDM
Service Delivery Manager?
It's good enough, but there's no real evidence it's the best, simply the largest.
It is possible with professionals, institutional knowledge, drills, and good tools.
You cannot just keep abstracting and chopping systems to smaller and smaller subsystems to make them easy to digest.
At some point someone needs to know how these coordinate and behave under disturbances. At some point someone needs to know at a low level what the hell is going on.
If AI is to tech what outsourcing was to manufacturing, then your analogy has me concerned for the future.
I mean this is true, but we aren't talking about the consumer here. We're talking about the industry which is to say the powerful people who own and run all these companies.
What has happened is that those overseas countries now have all the experienced engineers over there and they know it. So you see things like the Trump admin begging Korean companies to keep their workers in the US because they understand how to actually do these things. And the reason the Trump admin did that is because they owe favors to the rich people who want to profit off of factories in the US.