Every time a big company screws up, there are two highly informed sets of people who are guaranteed to be lurking, but rarely post, in a thread like this:
1) those directly involved with the incident, or employees of the same company. They have too much to lose by circumventing the PR machine.
2) people at similar companies who operate similar systems with similar scale and risks. Those people know how hard this is and aren’t likely to publicly flog someone doing their same job based on uninformed speculation. They know their own systems are Byzantine and don’t look like what random onlookers think it would look like.
So that leaves the rest, who offer insights based on how stuff works at a small scale, or better yet, pronouncements rooted in “first principles.”
I've noticed this amongst the newer "careerist" sort of software developer who is stumbling into the field for money, as opposed to the obsessive computer geek of yesteryear, who practiced it as a hobby. This character archetype is a transplant, say, less than five years ago from another, often non-technical discipline, and was taught or learned from overly simplistic materials that decry systems programming, or networking, or computer science concepts as unnecessary, impractical skills, reducing everything to writing JavaScript glue code between random NPM packages found on google.
Especially in a time where the gates have come crashing down to pronouncements of, "now anybody can learn to code by just using LLMs," there is a shocking tendency to overly simplify and then pontificate upon what are actually bewilderingly complicated systems wrapped up in interfaces, packages, and layers of abstraction that hide away that underlying complexity.
It reminds me of those quantum woo people, or movies like What the Bleep Do We Know!? where a bunch of quacks with no actual background in quantum physics or science reason forth from drastically oversimplified, mathematics-free models of those theories and into utterly absurd conclusions.
Even before LLMs were trendy, at the time of covid 19, a lot of people surprisingly became "experts" on the matter of virology and genetics on social networks.
Completely agreed. There are also former employees who have very educated opinions about what is likely going on, but between NDAs and whatnot there is only so much they are willing to say. It is frustrating for those in the know, but there are lines they can't or won't cross.
Whenever an HN thread covers subjects where I have direct professional experience I have to bite my tongue while people who have no clue can be as assertive and confidently incorrect as their ego allows them to be.
Some people can just let others be wrong and just stay silent, but some people can't help themselves. So if you say something really wrong, like this was caused by Netflix moving to Azure, they should have stayed on AWS! someone will come along to correct you. If you're looking for the right answer, post the wrong one, alongside some provoking statement (Windows is better than Linux because this works there), and you'll get the right answer faster than if you'd asked your question directly.
Right? A common complaint by outsiders is that Netflix uses microservices. I'd love to hear exactly how a monolith application is guaranteed to perform better, with details. What is the magic difference that would have ensured the live stream would have been successful?
I am one of the ones who complain about their microservices architecture quite a lot.
This comes from both first-hand experience of talking to several of their directors when consulted upon on how to make certain systems of theirs better.
It's not just a matter of guarantees, it's a matter of complexity.
Like right now Google search is dying and there's nothing that they can do to fix it because they have given up control.
The same thing happened with Netflix where they wanted to push too hard to be a tech company and have their tech blogs filled with interesting things.
On the back end they went too deep on the microservices complexity. And on the front end for a long time they suffered with their whole RxJS problem.
So it's not an objective matter of what's better. It's more cultural problem at Netflix. Plus the fact that they want to be associated with "Faang" and yet their product is not really technology based.
I doubt a "microservice" has anything to do with delivering the video frames. There are specific kinds of infrastructure tech that are specifically designed to serve live video to large amounts of clients. If they are in fact using a "microservice" to deliver video frames, then I'd ask them to have their heads examined. Microservices are typically used to do mundane short-lived tasks, not deliver video.
The only time I worked on a project that had a live television launch, it absolutely tipped over within like 2 minutes, and people on HN and Reddit were making fun of it. And I know how hard everyone worked, and how competent they were, so I sympathize with the people in these cases. While the internet was teeing off with easy jokes, engineers were swarming on a problem that was just not resolving, PMs were pacing up and down the hallway, people were getting yelled at by leadership, etc. It's like taking all the stress and complexity of a product launch and multiplying it by 100. And the thing I'm talking about was just a website, not even a live video stream.
Some breaks are just too difficult to predict. For example, I work in ecommerce and we had a page break because the content team pushed too many items into an array, that caused a back-end service to throw errors. Because we were the middle-service, taking from the CMS and making the request to back-end, not sure how we could have seen that issue coming in advance (and no one knew there was a limit).
I’m just pointing out that there are Netflix engineers reading all these words.
For every thread like this, there are likely people who are readers but cannot be writers, even though they know a lot. That means the active posters exclude that group, by definition.
These threads often have interesting and insightful comments, so that’s cool.
At the scale that Netflix just dealt with? Yeah I honestly think this is a case where less than 5000 people in the world are really qualified to comment.
3) the people supplying 1) and 2) with tools (hard- or software)
We (yep) don't know the exact details, but we do get sent snapshots of full configs and deployments to debug things... we might not see exact load patterns, but it's enough to know. And if course we can't tell due to NDAs.
If NFL decides to keep Netflix for that, that is. The bandwidth for that fight was rookie numbers, and after that fiasco, why would the NFL not break their contract and choose someone with a proven track record doing bigger live events, like the World Cup?
I'm sure 2) can post. But it won't be popular, so you'll need to dig to find it.
Most people are consumers and at the end of the day, their ability to consume a (boring) match was disrupted. If this was PPV (I don't think it is) the paid extra to not get the quality of product they expected. I'm not surprised they dominate the conversation.
You may have belonged to one of those groups in the past, or maybe you will someday. I certainly have. Many of the more seasoned folks on HN have.
Stuff goes wrong, random internet people jump on the opportunity to speculate and say wildly off-the-mark comments, and the engineers trying to keep the ship from sinking have to sit quietly for fear of making the PR backlash worse.
For an event like this, there already exists an architecture that can handle boundless scale: torrents.
If you code it to utilize high-bandwidth users upload, the service becomes more available as more users are watching -- not less available.
It becomes less expensive with scale, more available, more stable.
The be more specific, if you encode the video in blocks with each new block hash being broadcast across the network, just managing the overhead of the block order, it should be pretty easy to stream video with boundless scale using a DHT.
Could even give high-bandwidth users a credit based upon how much bandwidth they share.
With a network like what Netflix already has, the seed-boxes would guarantee stability. There would be very little delay for realtime streams, I'd imagine 5 seconds top. This sort of architecture would handle planet-scale streams for breakfast on top of the already existing mechanism.
But then again, I don't get paid $500k+ at a large corp to serve planet scale content, so what do I know.
The protocol for a torrent is that random parts of a file get seeded to random people requesting a file, and that the clients which act as seeds are able to store arbitrary amounts of data to then forward to other clients in the swarm. Do the properties about scaling still hold when it's a bunch of people all requesting real time data which has to be in-order? Do the distributed Rokus, Apple TVs, Fire TVs and other smart TVs all have the headroom in compute and storage to be able to simultaneously decode video and keep old video data in RAM and manage network connections with upload to other TVs in their swarm - and will uploading data to other TVs in the swarm not negatively impact their own download speeds?
1. Everyone only cares about the most recent "block". By the time a "user" has fully downloaded a block from Netflix's seedbox, the block is stale, so why would any other user choose to download from a peer rather from netflix directly?
2. If all the users would prefer to download from netflix directly rather than a p2p user, then you already have a somewhat centralized solution, and you gain nothing from torrents.
Yes, and then some idiot with an axe to grind against Logan Paul starts DDoSing people in the Netflix swarm, kicking them out of the livestream. This is always a problem because torrents, by design, are privacy-hostile. That's how the MAFIAA[1] figured out you were torrenting movies in 2004 and how they sent your ISP a takedown notice.
Hell, in the US, this setup might actually be illegal because of the VPPA[0]. The only reason why it's not illegal for the MAFIAA to catch you torrenting is because of a fun legal principle where criminals are not allowed to avail themselves of the law to protect their crimes. (i.e. you can't sue over a drug deal gone wrong)
[0] Video Privacy Protection Act, a privacy law passed which makes it illegal to ask video providers for a list of who watched what, specifically because a reporter went on a fishing expedition with video data.
[1] Music and Film Industry Association of America, a hypothetical merger of the MPAA and RIAA from a 2000s era satire article
Then, instead of people complaining about buffering issues, you'd get people complaining about how the greedy capitalists at Netflix made poor Joe Shmoe use all of his data cap, because they made him upload lots of data to other users and couldn't be bothered to do it themselves.
The way to deal with this is to constantly do live events, and actually build organizational muscle. Not these massive one off events in an area the tech team has no experience in.
We should always be doing (the thing we want to do)
Somme examples that always get me in trouble (or at least big heated conversations)
1. Always be building: It does not matter if code was not changed, or there has been no PRs or whatever, build it. Something in your org or infra has likely changed. My argument is "I would rather have a build failure on software that is already released, than software I need to release".
2. Always be releasing: As before it does not matter if nothing changed, push out a release. Stress the system and make it go through the motions. I can't tell you how many times I have seen things fail to deploy simply because they have not attempted to do so in some long period of time.
There are more just don't have time to go into them. The point is if "you did it, and need to do it again ever in the future, then you need to continuously do it"
Doing dry runs regularly makes sense, but whether actually shipping it makes sense seems context-dependent. It depends on how much you can minimize the side effects of shipping a release.
Consider publishing a new version of a library: you'd be bumping the version number all the time and invalidating caches, causing downstream rebuilds, for little reason. Or if clients are lazy about updating, any two clients would be unlikely to have the same version.
Or consider the case when shipping results in a software update: millions of customer client boxes wasting bandwidth downloading new releases and restarting for no reason.
Even for a web app, you are probably invalidating caches, resulting in slow page loads.
With enough work, you could probably minimize these side effects, so that releasing a new version that doesn't actually change anything is a non-event. But if you don't invalidate the caches, you're not really doing a full rebuild.
So it seems like there's a tension between doing more end-to-end testing and performance? Implementing a bunch of cache levels and then not using it seems counterproductive.
"Test what you fly, and fly what you test" (Supposedly from aviation)
"There should be one joint, and it should be greased regularly" (Referring to cryptosystems I think, but it's the same principle. Things like TLS will ossify if they aren't exercised. QUIC has provisions to prevent this.)
> 1. Always be building: It does not matter if code was not changed...
> 2. Always be releasing...
A good argument for this is security. Whatever libraries/dependencies you have, unpin the versions, and have good unit tests. Security vulnerabilities that are getting fixed upstream must be released. You cannot fix and remove those vulnerabilities unless you are doing regular releases. This in turn also implies having good unit tests, so you can do these builds and releases with a lower probability of releasing something broken. It also implies strong monitoring and metrics, so you can be the first to know when something breaks.
There should be a caveat that such this kind of decision should be based on experience and not treated as a rule that juniors might blindly follow. We all know how "fail fast and early" turned out (or whatever the exact phrase was).
They've been doing live events since 2023. But it's hard to be prepared for something that's never been done by anyone before — a superbowl scale event, entirely viewed over the internet. The superbowl gets to offload to cable and over the air. Interestingly, I didn't have any problems with my stream. So it sounds like the bandwidth problems might be localized, perhaps by data center or ISP.
I suspect a lot of it could be related to ISP bandwidth. I streamed it on my phone without issue. Another friend put their TV on their phone’s WiFi which also worked. Could be partly that phone hotspots lower video bandwidth by default.
I suspect it’s a bit of both Netflix issues and ISPs over subscribing bandwidth.
I would guess the majority of the streamed bandwidth was sourced from boxes like these in ISP's points of presences around the globe: https://openconnect.netflix.com/en/
So I agree the problems could have been localized to unique (region, ISP) combinations.
My suspicion is the same as yours, that this may have been caused by local ISPs being overwhelmed, but it could be a million other things too. I had network issues. I live in a heavily populated suburban area. I have family who live 1000+ miles away in a slightly less populated suburban area, they had no issues at all.
The ISP hypothesis doesn't make sense to me. I could not stream the live event from Netflix. But I could watch any other show on netflix or youtube or hulu at the same time.
Yeah, I think people are incorrectly assuming that everyone had the same experience with the stream. I watched the whole thing and only had a few instances of buffering and quality degradation. Not more than 30 seconds total during the stream.
I had issues here and there but there was workarounds. Then, towards the end, the quality either auto negotiated or was forced down to accommodate the massive pull.
Unless Netflix eng decides to release a public postmorterm, we can only speculate. In my time organizing small-time live streams, we always had up to 3 parallel "backup" streams (Vimeo, Cloudflare, Livestream). At Netflix's scale, I doubt they could simply summon any of these providers in, but I guess Akamai / Cloudflare would have been up for it.
Sometimes this just isn't feasible for cost reasons.
A company I used to work for ran a few Super Bowl ads. The level of traffic you get during a Super Bowl ad is immense, and it all comes at you in 30 seconds, before going back to a steady-state value just as quickly. The scale pattern is like nothing else I've ever seen.
Super Bowl ads famously seven million dollars. These are things we simply can't repeat year over year, even if we believed it'd generate the same bump in recognition each time.
I think Netflix have a fair bit of organisational muscle, perhaps the fight was considered not as large of an event as the NFL streams would be in the future.
Also, "No experience in" really? You have no idea if that's really the case
Everyone here talking like this something unique netflix had to deal with. Hotstar live streamed india va Pakistan cricket match with zero issues with all time high live viewership ever in the history of live telecast. Why would viewers paying $20 month want to think about their technical issues, they dropped the ball pure and simple. Tech already exists for this, it’s been done before even by espn, nothing new here.
But that's exactly the point: Netflix didn't do this in a vacuum, they did it within Netflix.
It might just have been easier to start from scratch, maybe using an external partner experienced in live streaming, but the chances of that decision happening in a tech-heavy company such as Netflix that seems to pride itself on being an industry leader are close to zero.
depending on whom you ask, the bitrate used by the stream is significantly lower than what is considered acceptable from free livestreaming services, that albeit stream to much, much smaller audience.
without splitting hairs, livestreaming was never their forte, and going live with degradation elsewhere is not a great look for our distributed computing champ.
Netflix is good only on streaming ready made content, not live streaming, but;
1. Netflix is a 300B company, this isn't a resources issue.
2. This isn't the first time they have done live streaming at this scale either. They already have prior failure experience, you expect the 2nd time to be better, if not perfect.
3. There were plenty of time between first massive live streaming to second. Meaning plenty of time to learn and iterate.
The problem is that provisioning vast capacity for peak viewership is expensive and requires long-term commitment. Some providers won't give you more connectivity to their network unless you sign a 12 month deal where you prepay that.
Peak traffic is very expensive to run, because you're building capacity that will be empty/unsused when the event ends. Who'd pay for that? That's why it's tricky and that's why Akamai charges these insane prices for live streaming.
A "public" secret in that network layer is usually not redundant in your datacenter even if it's promised. To have redundant network you'd need to double your investment and it'll seat idle of at 50% max capacity. For 2hr downtime per year when you restart the high-capacity routers it's not cost efficient for most clients.
Then sign a contract with Akamai, who has been in business for 25 years? You outsource if you aren’t planning to do something very often.
There is no middle ground where you commit a mediocre amount of resources, end up with downtime and a mediocre experience, and then go “but we saved money.”
What's your point? If they couldn't manage to secure the resources necessary, they shouldn't have agreed to livestream it. As a customer, I don't care AT ALL if it's difficult.
They have the NFL next month on Christmas day. So that'll be a big streaming session but I think it'll be nothing compared to this. Even Twitter was having problems handling the live pirate streams there.
Apple was clearly larger than Google when they came out with Apple Maps, and it was issue-laden for a long time. It is not a resource-issue, but a tech development maturity issue.
You can't solve your way out of a complex problem that you have created and which wasn't needed in the first place. The entire microservices thing was overly complex with zero benefits
I spoke to multiple Netflix senior technicians about this.
That's a ridiculous statement. PrimeVideo is the leader in terms of sports events streaming over internet and it is composed of hundreds of microservices.
Live streaming is just much harder than streaming, and it takes a years of work and a huge headcount to get something good.
People just do not appreciate how many gotchas can pop up doing anything live. Sure, Netflix might have a great CDN that works great for their canned content and I could see how they might have assumed that's the hardest part.
Live has changed over the years from large satellite dishes beaming to a geosat and back down to the broadcast center($$$$$), to microwave to a more local broadcast center($$$$), to running dedicated fiber long haul back to a broadcast center($$$), to having a kit with multiple cell providers pushing a signal back to a broadcast center($$), to having a direct internet connection to a server accepting a live http stream($).
I'd be curious to know what their live plan was and what their redundant plan was.
Sorry for the off topic but what’s this thing that I only come across in Hacker News about referring to a company by their stock exchange name (APPL, MSFT, etc) outside of a stock context? It seems really weird to me.
I was pointing out how dumb a multibillion dollar company is for getting this so wrong. Broadcasting live events is something that is underestimated by everyone that has never it, yet hubris of a major tech company thinking it knows better is biting them in the ass.
As many other people have commented, so many other very large dwarfing this event have been pulled off with no hiccups visible to the viewers. I have amazing stories of major hiccups during MLB World Series that viewers had no idea about happening, but “inside baseball” people knew. To the point that the head of the network caught something during the broadcast calling the director in the truck saying someone is either going to be fired or get a raise yet the audience would never have noticed if the person ended up getting fired. They didn’t, btw.
This is the whole point of chaos engineering that was invented at Netflix, which tests the resiliency of these systems.
I guess we now know the limits of what "at scale" is for Netflix's live-streaming solution. They shouldn't be failing at scale on a huge stage like this.
I look forward to reading the post mortem about this.
Everyone keeps mentioning at scale. I seriously doubt this was an "at scale" problem. I have strong suspicion this was a failure at the origination point being able to push a stable signal. That is not an "at scale" issue, but a hubris of we can do better/cheaper than broadcasting standard practices
If commercial = public, then no - you can not use multicast for this. It is heavily used within some enterprise networks though like if you go to a gym with lots of TVs they are all likely on multicast
Do you live stream the superbowl? Me and everyone I know watch it over antenna broadcast tv. I think it is easier to have millions of tvs catch airwaves vs millions of point to point https video streams.
When Netflix started it was the first in the space and breaking ground which is how they became a "tech" company that happens to stream media however it has been 15 years and since than the cloud providers have basically build "netflix as a service". I suspect most of the big streamers are using that instead of building their own in house thing and going through all the growing pains netflix is.
What are you talking about? The signal coming from a live event is the full package. The output of “the truck” has multiple outs including the full mix of all grafix, some only have the mix minus any branding, etc. While the isos get recorded in the truck, they are not pushed out to the broadcast center.
All of the “mixing” as you call it is done in the truck. If you’ve never seen it, it is quite impressive. In one part of the truck is the director and the technical director. The director is the one calling things like “ready camera 1”, “take 1”, etc. the TD is the one on the switcher pushing the actual buttons on the console to make it happen. Next to them is the graphics team prepping all of the stats made available to the TD to key in. In another area is the team of slomo/replay that are taking the feeds from all of the cameras to recorders that allow the operators to pull out the selects and make available for the director/TD to cut to. Typically in the back of the truck is the audio mixer that mixes all of the mics around the event in real time. All of that creates the signal you see on your screen. It leaves the back of the truck and heads out to wherever the broadcaster has better control
> People just do not appreciate how many gotchas can pop up doing anything live.
Sure thing, but also, how much resources do you think Netflix threw on this event? If organizations like FOSSDEM and CCC can do live events (although with way smaller viewership) across the globe without major hiccups on (relatively) tiny budgets and smaller infrastructure overall, how could Netflix not?
The CCC video crew has its fair share of geeks from broadcasting corporations and studio houses. Their combined institutional knowledge about live events and streaming distribution is probably in the same ballpark as that of giant global TV networks.
They also have the benefit of having practiced their craft at the CCC events for more than a decade. Twice a year. (Their summer event is smaller but still fairly well known. Links to talks show up on HN every now and then.)
Funky anecdote: the video crew at Assembly have more broadcasting and live AV gear for their annual event than most medium-sized studios.
> how much resources do you think Netflix threw on this event?
Based on the results, I hope it was a small team working 20% time on the idea. If you tell me they threw everything they had at it to this result, then that's even more embarrassing for them.
Cable TV (or even OTA antenna in the right service area) is simply a superior live product compared to anything streaming.
The Masters app is the only thing that comes close imo.
Cable TV + DVR + high speed internet for torrenting is still an unmatched entertainment setup. Streaming landscape is a mess.
It's too bad the cable companies abused their position and lost any market goodwill. Copper connection direct to every home in America is a huge advantage to have fumbled.
The interesting thing is that a lot of TV infrastructure is now running over IP networks. If I were to order a TV connection for my home I'd get an IPTV box to connect to my broadband router via Ethernet, and it'd simply tell the upstream router to send a copy of a multicast stream my way.
Reliable and redundant multicast streaming is pretty much a solved problem, but it does require everyone along the way to participate. Not a problem if you're an ISP offering TV, definitely a problem if you're Netflix trying to convince every single provider to set it up for some one-off boxing match.
This. Im honestly going to cancel my streaming shit. They remove and mess with it so much. Like right now HBO max or whatever removes my recent watches after 90 days. why?
It wasn't even just buffering issues, the feed would just stop and never start again until I paused it and then clicked "watch live" with the remote.
It was really bad. My Dad has always been a fan of boxing so I came over to watch the whole thing with him.
He has his giant inflatable screen and a projector that we hooked up in the front lawn to watch it, But everything kept buffering. We figured it was the Wi-Fi so he packed everything up and went inside only to find the same thing happening on ethernet.
He was really looking forward to watching it on the projector and Netflix disappointed him.
Commercial boxing has always been like WWE or MMA with a thin veneer of actual sport to it, i.e. it is just entertainment[1].
To rephrase your question then what does someone think of the entertainment on display?
I don't think it was good entertainment.
None of the hallmarks of a good show was present. i.e. It wasn't close, nor was it bloody or anything unexpected like say a KO everything went pretty much as expected. It wasn't nice watch as all,no skill or talent was on disply, all Paul had to do was use his speed to backpedal from the slow weak punches of a visibly older tyson with a bum knee and land some points occasionally to win.
--
[1] There is a deeper argument here is any spectator sports just entertainment or is truly about skill talent and competition. Boxing however including the ones promoted by traditional four major associations falls clearly on the entertainment side than say another sport like NFL to me.
Was this necessary? The comment was on a tech forum about the tech issues, do we really need to reprosecute the argument that it wasn’t real boxing here too? There are plenty of other places for those so painfully inclined to do so
On a few forum sites I'm on, people are just giving up. Looking forward to the post-mortem on how they weren't ready for this (with just a tiny bit of schadenfreude because they've interviewed and rejected me twice).
AB84 streamed it live from a box at the arena to ~5M viewers on Twitter. I was watching it on Netflix, I didn't have any problems, but I also put his live stream up for the hell of it. He didn't have any issues that I saw.
It’s not everyone. Works fine for me though I did have to reload the page when I skipped past the woman match to the Barrios Ramos fight and it was stuck buffering at 99%.
1) those directly involved with the incident, or employees of the same company. They have too much to lose by circumventing the PR machine.
2) people at similar companies who operate similar systems with similar scale and risks. Those people know how hard this is and aren’t likely to publicly flog someone doing their same job based on uninformed speculation. They know their own systems are Byzantine and don’t look like what random onlookers think it would look like.
So that leaves the rest, who offer insights based on how stuff works at a small scale, or better yet, pronouncements rooted in “first principles.”
Especially in a time where the gates have come crashing down to pronouncements of, "now anybody can learn to code by just using LLMs," there is a shocking tendency to overly simplify and then pontificate upon what are actually bewilderingly complicated systems wrapped up in interfaces, packages, and layers of abstraction that hide away that underlying complexity.
It reminds me of those quantum woo people, or movies like What the Bleep Do We Know!? where a bunch of quacks with no actual background in quantum physics or science reason forth from drastically oversimplified, mathematics-free models of those theories and into utterly absurd conclusions.
Deleted Comment
Whenever an HN thread covers subjects where I have direct professional experience I have to bite my tongue while people who have no clue can be as assertive and confidently incorrect as their ego allows them to be.
https://xkcd.com/386/
This comes from both first-hand experience of talking to several of their directors when consulted upon on how to make certain systems of theirs better.
It's not just a matter of guarantees, it's a matter of complexity.
Like right now Google search is dying and there's nothing that they can do to fix it because they have given up control.
The same thing happened with Netflix where they wanted to push too hard to be a tech company and have their tech blogs filled with interesting things.
On the back end they went too deep on the microservices complexity. And on the front end for a long time they suffered with their whole RxJS problem.
So it's not an objective matter of what's better. It's more cultural problem at Netflix. Plus the fact that they want to be associated with "Faang" and yet their product is not really technology based.
this is where you get up and leave
That’s a bold claim given that people with inside knowledge could post here without disclosing they are insiders.
Is that some kind of No True Scotsman?
For every thread like this, there are likely people who are readers but cannot be writers, even though they know a lot. That means the active posters exclude that group, by definition.
These threads often have interesting and insightful comments, so that’s cool.
GP clearly meant some people not everybody. You are the one making bold claims.
Deleted Comment
We (yep) don't know the exact details, but we do get sent snapshots of full configs and deployments to debug things... we might not see exact load patterns, but it's enough to know. And if course we can't tell due to NDAs.
now take this realization and apply it to any news article or forum post you read and think about how uninformed they actually are.
Most people are consumers and at the end of the day, their ability to consume a (boring) match was disrupted. If this was PPV (I don't think it is) the paid extra to not get the quality of product they expected. I'm not surprised they dominate the conversation.
I'm also not going to criticise my peers because they could recognise me and I might want to work with them one day.
Deleted Comment
Stuff goes wrong, random internet people jump on the opportunity to speculate and say wildly off-the-mark comments, and the engineers trying to keep the ship from sinking have to sit quietly for fear of making the PR backlash worse.
Dead Comment
And looking through the comments, this is just wrong.
Dead Comment
If you code it to utilize high-bandwidth users upload, the service becomes more available as more users are watching -- not less available.
It becomes less expensive with scale, more available, more stable.
The be more specific, if you encode the video in blocks with each new block hash being broadcast across the network, just managing the overhead of the block order, it should be pretty easy to stream video with boundless scale using a DHT.
Could even give high-bandwidth users a credit based upon how much bandwidth they share.
With a network like what Netflix already has, the seed-boxes would guarantee stability. There would be very little delay for realtime streams, I'd imagine 5 seconds top. This sort of architecture would handle planet-scale streams for breakfast on top of the already existing mechanism.
But then again, I don't get paid $500k+ at a large corp to serve planet scale content, so what do I know.
1. Everyone only cares about the most recent "block". By the time a "user" has fully downloaded a block from Netflix's seedbox, the block is stale, so why would any other user choose to download from a peer rather from netflix directly?
2. If all the users would prefer to download from netflix directly rather than a p2p user, then you already have a somewhat centralized solution, and you gain nothing from torrents.
Hell, in the US, this setup might actually be illegal because of the VPPA[0]. The only reason why it's not illegal for the MAFIAA to catch you torrenting is because of a fun legal principle where criminals are not allowed to avail themselves of the law to protect their crimes. (i.e. you can't sue over a drug deal gone wrong)
[0] Video Privacy Protection Act, a privacy law passed which makes it illegal to ask video providers for a list of who watched what, specifically because a reporter went on a fishing expedition with video data.
[1] Music and Film Industry Association of America, a hypothetical merger of the MPAA and RIAA from a 2000s era satire article
We should always be doing (the thing we want to do)
Somme examples that always get me in trouble (or at least big heated conversations)
1. Always be building: It does not matter if code was not changed, or there has been no PRs or whatever, build it. Something in your org or infra has likely changed. My argument is "I would rather have a build failure on software that is already released, than software I need to release".
2. Always be releasing: As before it does not matter if nothing changed, push out a release. Stress the system and make it go through the motions. I can't tell you how many times I have seen things fail to deploy simply because they have not attempted to do so in some long period of time.
There are more just don't have time to go into them. The point is if "you did it, and need to do it again ever in the future, then you need to continuously do it"
Consider publishing a new version of a library: you'd be bumping the version number all the time and invalidating caches, causing downstream rebuilds, for little reason. Or if clients are lazy about updating, any two clients would be unlikely to have the same version.
Or consider the case when shipping results in a software update: millions of customer client boxes wasting bandwidth downloading new releases and restarting for no reason.
Even for a web app, you are probably invalidating caches, resulting in slow page loads.
With enough work, you could probably minimize these side effects, so that releasing a new version that doesn't actually change anything is a non-event. But if you don't invalidate the caches, you're not really doing a full rebuild.
So it seems like there's a tension between doing more end-to-end testing and performance? Implementing a bunch of cache levels and then not using it seems counterproductive.
"Test what you fly, and fly what you test" (Supposedly from aviation)
"There should be one joint, and it should be greased regularly" (Referring to cryptosystems I think, but it's the same principle. Things like TLS will ossify if they aren't exercised. QUIC has provisions to prevent this.)
> 2. Always be releasing...
A good argument for this is security. Whatever libraries/dependencies you have, unpin the versions, and have good unit tests. Security vulnerabilities that are getting fixed upstream must be released. You cannot fix and remove those vulnerabilities unless you are doing regular releases. This in turn also implies having good unit tests, so you can do these builds and releases with a lower probability of releasing something broken. It also implies strong monitoring and metrics, so you can be the first to know when something breaks.
I suspect it’s a bit of both Netflix issues and ISPs over subscribing bandwidth.
So I agree the problems could have been localized to unique (region, ISP) combinations.
Unless Netflix eng decides to release a public postmorterm, we can only speculate. In my time organizing small-time live streams, we always had up to 3 parallel "backup" streams (Vimeo, Cloudflare, Livestream). At Netflix's scale, I doubt they could simply summon any of these providers in, but I guess Akamai / Cloudflare would have been up for it.
A company I used to work for ran a few Super Bowl ads. The level of traffic you get during a Super Bowl ad is immense, and it all comes at you in 30 seconds, before going back to a steady-state value just as quickly. The scale pattern is like nothing else I've ever seen.
Super Bowl ads famously seven million dollars. These are things we simply can't repeat year over year, even if we believed it'd generate the same bump in recognition each time.
Also, "No experience in" really? You have no idea if that's really the case
Rolling Stone reported 120m for Tyson and Paul on Netflix [1].
These are very different numbers. 120m is Super Bowl territory. Could Hotstar handle 3-4 of those cricket matches at the same time without issue?
[0] https://www.the-independent.com/sport/cricket/india-pakistan...
[1] https://www.rollingstone.com/culture/culture-news/jake-paul-...
https://x.com/netflix/status/1857906492235723244?s=46
https://www.icc-cricket.com/news/biggest-cricket-world-cup-e...
I watched it for the game trailers, actually shocked that it's also superbowl viewership territory.
https://variety.com/2023/digital/news/game-awards-2023-break...
It might just have been easier to start from scratch, maybe using an external partner experienced in live streaming, but the chances of that decision happening in a tech-heavy company such as Netflix that seems to pride itself on being an industry leader are close to zero.
depending on whom you ask, the bitrate used by the stream is significantly lower than what is considered acceptable from free livestreaming services, that albeit stream to much, much smaller audience.
without splitting hairs, livestreaming was never their forte, and going live with degradation elsewhere is not a great look for our distributed computing champ.
1. Netflix is a 300B company, this isn't a resources issue.
2. This isn't the first time they have done live streaming at this scale either. They already have prior failure experience, you expect the 2nd time to be better, if not perfect.
3. There were plenty of time between first massive live streaming to second. Meaning plenty of time to learn and iterate.
Peak traffic is very expensive to run, because you're building capacity that will be empty/unsused when the event ends. Who'd pay for that? That's why it's tricky and that's why Akamai charges these insane prices for live streaming.
A "public" secret in that network layer is usually not redundant in your datacenter even if it's promised. To have redundant network you'd need to double your investment and it'll seat idle of at 50% max capacity. For 2hr downtime per year when you restart the high-capacity routers it's not cost efficient for most clients.
There is no middle ground where you commit a mediocre amount of resources, end up with downtime and a mediocre experience, and then go “but we saved money.”
Is that a surprise? They're not who I would think of first as a gold standard for high viewership live streams.
Deleted Comment
What was the previous fail?
I spoke to multiple Netflix senior technicians about this.
They said that's the whole shtick.
Live streaming is just much harder than streaming, and it takes a years of work and a huge headcount to get something good.
Live has changed over the years from large satellite dishes beaming to a geosat and back down to the broadcast center($$$$$), to microwave to a more local broadcast center($$$$), to running dedicated fiber long haul back to a broadcast center($$$), to having a kit with multiple cell providers pushing a signal back to a broadcast center($$), to having a direct internet connection to a server accepting a live http stream($).
I'd be curious to know what their live plan was and what their redundant plan was.
This isn’t NFLX’s first rodeo in live streaming. Have seen a handful of events pop up in their apps.
There is no excuse. All of the resources and talent at their disposal, and they looked absolutely amateurish. Poor optics.
I would be amazed if they are able to secure another exclusive contract like this in the future.
Just what the fuck are these people doing?
If I were a major investor in them I'd be pissed.
I was pointing out how dumb a multibillion dollar company is for getting this so wrong. Broadcasting live events is something that is underestimated by everyone that has never it, yet hubris of a major tech company thinking it knows better is biting them in the ass.
As many other people have commented, so many other very large dwarfing this event have been pulled off with no hiccups visible to the viewers. I have amazing stories of major hiccups during MLB World Series that viewers had no idea about happening, but “inside baseball” people knew. To the point that the head of the network caught something during the broadcast calling the director in the truck saying someone is either going to be fired or get a raise yet the audience would never have noticed if the person ended up getting fired. They didn’t, btw.
I guess we now know the limits of what "at scale" is for Netflix's live-streaming solution. They shouldn't be failing at scale on a huge stage like this.
I look forward to reading the post mortem about this.
Every major network can broadcast the Super Bowl without issue.
And while Netflix claims it streamed to 280 million, that’s if every single subscriber viewed it.
Actual numbers put it in the 120 million range. Which is in line with the Super Bowl.
Maybe Netflix needs to ask CBS or ABC how to broadcast
That’s a very different area to transmission of live to end users.
All of the “mixing” as you call it is done in the truck. If you’ve never seen it, it is quite impressive. In one part of the truck is the director and the technical director. The director is the one calling things like “ready camera 1”, “take 1”, etc. the TD is the one on the switcher pushing the actual buttons on the console to make it happen. Next to them is the graphics team prepping all of the stats made available to the TD to key in. In another area is the team of slomo/replay that are taking the feeds from all of the cameras to recorders that allow the operators to pull out the selects and make available for the director/TD to cut to. Typically in the back of the truck is the audio mixer that mixes all of the mics around the event in real time. All of that creates the signal you see on your screen. It leaves the back of the truck and heads out to wherever the broadcaster has better control
Sure thing, but also, how much resources do you think Netflix threw on this event? If organizations like FOSSDEM and CCC can do live events (although with way smaller viewership) across the globe without major hiccups on (relatively) tiny budgets and smaller infrastructure overall, how could Netflix not?
They also have the benefit of having practiced their craft at the CCC events for more than a decade. Twice a year. (Their summer event is smaller but still fairly well known. Links to talks show up on HN every now and then.)
Funky anecdote: the video crew at Assembly have more broadcasting and live AV gear for their annual event than most medium-sized studios.
Or, for that matter, Youtube (Live) and Twitch.
Based on the results, I hope it was a small team working 20% time on the idea. If you tell me they threw everything they had at it to this result, then that's even more embarrassing for them.
The Masters app is the only thing that comes close imo.
Cable TV + DVR + high speed internet for torrenting is still an unmatched entertainment setup. Streaming landscape is a mess.
It's too bad the cable companies abused their position and lost any market goodwill. Copper connection direct to every home in America is a huge advantage to have fumbled.
Reliable and redundant multicast streaming is pretty much a solved problem, but it does require everyone along the way to participate. Not a problem if you're an ISP offering TV, definitely a problem if you're Netflix trying to convince every single provider to set it up for some one-off boxing match.
So far, no one seems particularly motivated.
It was really bad. My Dad has always been a fan of boxing so I came over to watch the whole thing with him.
He has his giant inflatable screen and a projector that we hooked up in the front lawn to watch it, But everything kept buffering. We figured it was the Wi-Fi so he packed everything up and went inside only to find the same thing happening on ethernet.
He was really looking forward to watching it on the projector and Netflix disappointed him.
What did your Dad think about the 'boxing'?
To rephrase your question then what does someone think of the entertainment on display?
I don't think it was good entertainment.
None of the hallmarks of a good show was present. i.e. It wasn't close, nor was it bloody or anything unexpected like say a KO everything went pretty much as expected. It wasn't nice watch as all,no skill or talent was on disply, all Paul had to do was use his speed to backpedal from the slow weak punches of a visibly older tyson with a bum knee and land some points occasionally to win.
--
[1] There is a deeper argument here is any spectator sports just entertainment or is truly about skill talent and competition. Boxing however including the ones promoted by traditional four major associations falls clearly on the entertainment side than say another sport like NFL to me.
He’s definitely got issues..
Deleted Comment