Readit News logoReadit News
paxys · 3 years ago
I love technical content like this. Not only is it incredibly interesting and informative, it also serves as a perfect counterpoint to the popular "why does Netflix need X thousand engineers, I could build it in a weekend" sentiment that is frequently brought up on forums like this one.

Building software and productionalizing/scaling it are two very different problems, and the latter is far more difficult. Running a successful company always requires an unlimited number of very smart people who are willing to get their hands dirty optimizing every aspect of the product and business. Too many people today think that programming starts and ends at pulling a dozen popular libraries and making some API calls.

tetha · 3 years ago
The way I've been putting it to people lately is: Never underestimate how hard a problem can grow by making it big. And also, at times, it is hard to appreciate how difficult something becomes if you haven't walked the path at least partially.

Like, from work, hosting postgres. At this point, I very much understand why a consultant once said - "You cannot make mistakes in a postgres 10GB or 100GB and a dozen transactions per second in size". And he's right, give it some hardware, don't touch knobs except for 1 or 2 and that's it. The average application accessing our postgres clusters is just too small to cause problems.

And then we have 2 postgres clusters with a dataset size of 1TB or 2TB peaking at like 300 - 400 transactions per second. That's not necessarily big or busy for what postgres can do, but it becomes noticeable that you have to do some things right at this point and some patterns just stop working hard.

And then there are people dealing with postgres instances 100 - 1000x bigger than this. And that's becoming tangibly awesome and frightening by now, using awesome in a more oldschool way there.

mlrtime · 3 years ago
Not only make it big, engineer it in a way that makes it profitable for the business.

I'm sure there are many teams that could design such a network with nearly unlimited resources, but it is entirely different when you have profit margins.

jiggawatts · 3 years ago
I know how to operate a hose! How hard can it be to manage a large stream? It's "just" more water: https://www.youtube.com/watch?v=jxNM4DGBRMU
victor106 · 3 years ago
As someone once said “Big is different”
le-mark · 3 years ago
> Too many people today think that programming starts and ends at pulling a dozen popular libraries and making some API calls.

The needle keeps moving doesn’t it? A tremendous breadth of difficult problems can be effectively addressed by pulling together libraries and calling APIs today that weren’t possible before. Today’s hard problems are yesterday impossibilities. The challenge for those seeking to make an impact is to dream big enough.

ezconnect · 3 years ago
The basic problem is the same, pushing the hardware to its limits.

Deleted Comment

ternaryoperator · 3 years ago
This! I am frustrated at how often devs will not accept that simple things become incredibly complicated at scale. That favorite coding technique? That container you wrote? Those tests you added? All good, but until you've tested them at scale, don't assert that everyone should use them. This dynamic is true in the other direction too: that techniques often taken for granted simply are not feasible in highly resource-constrained environments. With rare exception, the best we can say with accuracy is that "I find X works well enough in the typical situations I code for."
throwaway012377 · 3 years ago
> simple things become incredibly complicated at scale

In a way it's the opposite. Things scale up by removing complex interactions. Software gets faster by solving the same problem with less steps.

Any beginner can write to much code and pile up too many software and hardware components.

It takes knowledge to simplify things.

closeparen · 3 years ago
People have career and ego incentives to overestimate the constraints of the environment they're working in, and will use lazy heuristics like rounding some number to "a lot" and assuming that it justifies pulling out all the stops. Computers are fast. One or a few of them can accomplish quite a lot. Some people really do exceed those limits, and godspeed to them, but even in the companies they work at, the vast majority of applications do not.
trasz · 3 years ago
This goes both ways - things written “to scale” often, or usually, do worse in more typical, smaller scenarios, because of all the bloat that’s irrelevant to 98% of use cases.

That’s one of the fundamental problems in todays Open Source world: everything’s being optimized for a very rare case.

Aaronmacaron · 3 years ago
I think the problem is that the "easy" parts of netflix such as the UI or the recommendation engine seem like they were hacked together over the weekend. Of course deploying and maintaining something of the scale of netflix is incredibly hard. But if they can afford thousands of engineers who optimize the performance why can't they hire a few UI/UX engineers to fix the godawful interface which is slightly different on every device? I think this is where this sentiment stems from.
paxys · 3 years ago
Technically speaking I think Netflix's UX blows every other streaming app out of the water. It loads instantly, scrolling is smooth, search is instant. Buttons are where you'd expect and do what you expect. They have well-performing and up-to-date apps for every conceivable device and appliance. They support all the latest audio and video codecs.

This is all in stark contrast to services like HBO Max and Disney+ which still stutter and crash multiple times a day. Amazon for some reason treats every season of a TV show and HD/SD versions of movies as independent items in their library. I still haven't been able to download a HBO Max video for offline viewing on iOS without the app crashing on me at 99%.

The problems you mention with Netflix are real, but they have more to do with the business side of things. Netflix recommendations seem crap because they don't have a lot of third party content to recommend in the first place. Their front page layout is optimized to maximize repetition and make their library seem larger. They steer viewers to their own shows because that's what the business team wants. None of these are problems you can fix by reassigning engineers.

Shaanie · 3 years ago
I'm surprised you think Netflix's UI and UX is that poor. Which streaming service do you think does a better job?
bradstewart · 3 years ago
I honestly find Netflix's the easiest to navigate, by far.

Hulu did that big redesign, and it's extremely pretty to look at, but even after a few years of trying to use it, I still struggle to do anything other than "resume episode". Finding the previous episode, list episodes, etc is always an exercise in randomly clicking, swiping, long pressing, waiting for loading bars, etc.

One thing Netflix really got right as well: the "Watch It Again" section. So many times I want to rewatch the episode I just "finished" (because either my wife finished a show when I leave the room, the kids fell off the table, I fell asleep or wasn't paying attention, etc), and every other platform makes this extremely difficult to find.

Back to Hulu--the only way I know how is the search feature, which is a PITA with a remote.

NavinF · 3 years ago
> godawful interface which is slightly different on every device

Which devices are you referring to? I’ve only used the PC and mobile interfaces both of which are quite pleasant.

stackbutterflow · 3 years ago
That's what puzzles me about Uber. I believe that behind the scenes it does pretty complex things as explained many times on HN, but it's the worst app I've ever used. UI and UX wise it's so bad that if you told me it was a bootcamp graduation project I'd have no problem believing you.
jdyyc · 3 years ago
I work on a very technically trivial service at a large company.

It's the kind of thing that people run at home on a raspberry pi, docker container or linux server and it consumes almost no resources.

But at our organization this needs to scale up to millions of users in an extremely reliable way. It turns out this is incredibly hard and expensive and takes a team of people and a bucket of money to pull it off correctly.

When I tell people what I work on they only think about their tiny implementation of it, not the difficulty of doing it at an extreme scale.

Sytten · 3 years ago
I think a fair criticism would be how many engineers they have compared to their competitors. Disney+ is on a similar scale, can they do the same/similar job with less people? And considering netflix pays top of market, how much does Disney spends for their engineering effort to get their result. Would netflix benefit from just throwing more hardware at the problem vs paying more engineers 400-500k/y to optimize?
iamricks · 3 years ago
Standing on the shoulders of giants, Netflix engineers didn't have blog posts from other companies on how to handle the scale they started facing. Facebook didn't have blog posts to reference when they scaled to 1B users. They pay for talent that have built systems that had not been built before and they have seen a return on it so they continue to do it.
toast0 · 3 years ago
> Would netflix benefit from just throwing more hardware at the problem vs paying more engineers 400-500k/y to optimize?

Where the CDN boxes go, you can't always just throw more hardware. There's a limited amount of space, it's not controlled by Netflix, and other people want to throw hardware into that same space. Pushing 800gbps in the same amount of space that others do 80gbps (or less) is a big deal.

pclmulqdq · 3 years ago
The engineers are definitely cost-effective at this scale. They may be the highest-leverage engineers at the company in terms of $ earned from their efforts compared to $ spent. The improvements that come from performance engineers at large companies are frequently worth $10M/year/person or more.

Most companies maintain internal calculations of these sorts of things, and make rational decisions.

paxys · 3 years ago
Disney (the company) has 20x the number of employees as Netflix, and just 2x the market cap (in fact they were briefly worth the same last year), ~2x the revenue and 2/5 the net income. So Netflix is clearly doing something right.
jwmoz · 3 years ago
I watch Disney content sometimes and it constantly drops or freezes, you can see the difference in quality compared to Netflix.
entropie · 3 years ago
I wasn't able to watch disney+ via chromecast for like a year in 4k. Stuttering every 10 seconds or so. I never had problems like this with netflix.
slillibri · 3 years ago
Disney bought a majority ownership in BAMTECH to build Disney+.
rybosworld · 3 years ago
That seems like a fair point if you just consider the video streaming. I know that Netflix wants to break into gaming. I'd imagine the bandwidth required for that is higher than streaming videos.
seydor · 3 years ago
I dont see the point. A centralized data hose that is replacing what internet was designed to be : a decentralized, multi routed network. The problem may be useful to them, but unlikely to be useful to anyone who doesn't already work there. I dunno, if it was possible to monetize decentralized or bittorrent video hosting, i think it would solve the problem in a more interesting and resilient way. With fewer engineers.

But it's like, every discussion today must end with something about the pay and head count of engineers.

jedberg · 3 years ago
It's funny you mention this. When I worked at Netflix, we looked at making streaming peer to peer. There were a lot of problems with it though. Privacy issues, most people have terrible upload bandwidth from home, people didn't like the idea of their hardware serving other customers, home hardware is flakey so you'd constantly be doing client selection, and other problems.

So it turns out decentralized multi routed is not a good solution for video streaming.

oleganza · 3 years ago
I understand and even share a little bit of your sentiment, but I'm tired of stretched "X is now not what X was supposed to be".

Strictly speaking, the Internet was supposed to help some servers survive and continue working together despite some others being destroyed by a nuke. That is more-or-less the case today: we see how people use VPNs to route around censorship. Whether you were supposed to stream TikTok videos directly from the phones of their authors or through a centralized data hose - i'm not sure that was ever the grand idea.

Also "decentralized" and "monetize" don't go well together because innovation is stimulated by profit margins and rent-free decentralized solutions by definition have those margins equal to zero (otherwise the solution is not decentralized enough).

paxys · 3 years ago
While we are at it let's just put video streaming on the blockchain! Who needs all these engineers and servers.
rvnx · 3 years ago
I think nobody said Netflix' infrastructure can be built in a weekend. However, the scale doesn't matter that much after a certain point once the scaling "wall" has been pierced. If you are a biscuit factory that produces 100'000'000 biscuits per year or 500'000'000 biscuit per year then the gap between 100M and 500M isn't that impressive so much anymore as it's mostly about scaling existing processes. However, if you turn a 1'000 biscuit shop into a 1'000'000 biscuits company then it's very impressive.
bmurphy1976 · 3 years ago
Nonsense.

It's still impressive. A 5x increase at that scale can be a phenomenal challenge. Where do you source the ingredients? Where do you build the factories (plural because at that scale you almost certainly have multiple locations in different geographic locales subject to different regulatory structures). Where do you hire the people? How do you manage it? What about the storage and shipping and maintenance of all the equipment and on and on? How much do you do in house how much do you outsource to partners? What happens when a partner goes belly up or can't meet your ever increasing needs?

Your comment is a great example of what the OP pointed out.

paxys · 3 years ago
It's the exact opposite.

Taking the software example, you can easily scale from 1 to 100 users on your own machine. You can handle thousands by moving to a shared host. Using off-the-shelf web servers and load balancers will help you serve a million+. From there on you'll have to spend a lot more effort optimizing and fixing bottlenecks to get to tens, maybe hundreds of millions. What if you want to handle a billion users? Five billion? Ten billion? It always gets harder, not easier.

Pushing the established limits of a problem takes exponentially more effort than reusing existing solutions, even though the marginal improvement may be a lot smaller. Getting from 99.9% to 99.99% efficiency takes more effort than getting from 90% to 99%, which takes more effort than getting from 50% to 90%.

You never pierce the scaling wall. It only keeps getting higher.

loopercal · 3 years ago
If you told McDonald's to double their number of McRibs produced next year that would be an incredible challenge to meet. They already sell enough that it affects the global pork market, it'd be insane for them to double their demand for pork. What about other supplies, would this result in a reduced burger demand? How can they ensure they can respond appropriately either way? They probably run near fridge/storage capacity, does increasing this mean they need to also increase storage at restaurants?

That's a 2X increase. Now do it again and a half for a 5x. Crazy to say there's a "scaling wall" that once you "pierce" it's easy to scale up. It's the opposite, McDonald's already knows how to supply and sell X McRibs a year, there's no company that's ever sold 5X those McRibs so they have to figure it out themselves.

zeroxfe · 3 years ago
> the gap between 100M and 500M isn't that impressive

This is absolutely not true. The closer you are to peak performance, the harder it is to scale, and the returns diminish heavily. At many major tech companies, there's a huge amount of effort into just 1% - 5% optimizations -- these efforts really require creative thinking and complex engineering (not just "scaling existing processes".) At the volumes these companies operate, even a 1% optimization is quite significant.

bombcar · 3 years ago
Part of it depends on if "build it five more times, again" is a viable strategy.

Building five "Netflixes" with identical content is possible; the amount of content wouldn't change (it would decrease, the cynic says); you just need parallel copies of everything (servers, bandwidth, etc).

The fun would come in syncing usernames, etc through the system.

It's an entirely different class of problem compared to "acquire resource, convert it, sell it".

belinder · 3 years ago
It's a good point, and I think it's an interesting comparison. Obviously improving by a factor of 1000 is better than improving by a factor of 5. But the absolute improvement is still 4 times larger. 400'000'000 extra biscuits is going to bring a lot more revenue than 999'000 biscuits
summerlight · 3 years ago
> However, the scale doesn't matter that much after a certain point once the scaling "wall" has been pierced.

Sorry, you gotta overhaul majority of your architecture and its components to scale by every 10x. It's not a single "scaling wall" to break through but it's more of a relentless stream of uphill battles. And this gets even more interesting when you reached to the point where there's no prior art for your problem, usually at hundreds of billions of users.

mannyv · 3 years ago
Actually, you're incorrect. Scale problems seem to have quanta. In your example you will have physical issues with ingredients, etc at some point. It might be storage, it might be because you've run out of water. It might be because you've run out of electricity.

Making 5 things and making 5,000 things is as different as making 50,000 things and 1m things. There are always cost constraints at each level, and each design can only go so far.

nwallin · 3 years ago
> the popular "why does Netflix need X thousand engineers, I could build it in a weekend" sentiment that is frequently brought up on forums like this one.

I don't think that's a popular sentiment about Netflix. Twitter, Reddit, Facebook, yes, but Netflix, YouTube, Zoom, not so much.

mihaic · 3 years ago
I don't think this actually answers why Netflix needs to many engineers. This seems like the sort of thing that one or two experienced engineers would spend a year refining, and it would turn out like this.

This is the sort of impressive work that I've never seen scale.

drewg123 · 3 years ago
Author here... Yes, most of this work was done by me, with help from a handful of us on the OCA kernel team at Netflix (and external FreeBSD developers), and our vendor partners (Mellanox/NVIDIA).

With that said, we are standing on the shoulders of giants. There are tons of other optimizations not mentioned in this talk where removing any one of them could tank performance. I'm giving a talk about that at EuroBSDCon next month.

throwaway012377 · 3 years ago
> "why does Netflix need X thousand engineers, I could build it in a weekend"

Believe me or not, I was in a company doing web file streaming in 2009 using Nginx, sendfile and SSL offloading on the NIC.

It was installed by one dude. A standard Linux distro, standard kernel and no custom software. Just compile the SSL offloading kernel module once.

yibg · 3 years ago
And how many concurrent users did you have?
summerlight · 3 years ago
Yeah, that's perhaps nice (and hopefully moderately interesting) enough for a hobbyist work. Good luck for multiplying the scale by 1,000,000 for many dimensions.
Quarrelsome · 3 years ago
> why does Netflix need X thousand engineers, I could build it in a weekend

I would like to hope nobody asks that. Video is the one of the, if the not the hardest data plumbing use-case on the internet.

dragontamer · 3 years ago
I'd say realtime communications is harder.

A lot of these tricks being discussed here cannot be applied to Skype calls.

onlyrealcuzzo · 3 years ago
To be pedantic, scaling by itself isn't that difficult.

Scaling cost-effectively is.

eru · 3 years ago
Yes, and No. At some point, even scaling at all would be hard.

(Just like sending a human to Alpha Centauri is hard, even if you had unlimited funds.)

kaba0 · 3 years ago
It depends entirely on the problem domain. Sure, it is more of a devops problem when the problem is trivially parallelizable, but often you have a bottleneck service (e.g. the database) that has to run on a single machine. No matter how many instance serves the frontend * if every call will have to pass through that single machine.

* after a certain scale

sllabres · 3 years ago
Tell that e.g. Tesla

What I've read they burned a lot of money and hat large problems scaling nevertheless. Which I don't find too surprising, not because they are unable, but because it isn't easy to scale.

From my experience and from what I read scaling people roughly a power of ten is a larger change in an organisation and therefor likely a challenge. For _any_ technical process the boundaries might not be strictly a power of ten but i would say that scaling a power of a hundred is a challenge if this value is not already reached on any process in your organisation.

xnx · 3 years ago
> Building software and productionalizing/scaling it are two very different problems, and the latter is far more difficult.

Is this claim based on some example I should know? Countless companies never achieve product/market fit, but very few I can think of fail because they weren't able to handle all their customers.

maerF0x0 · 3 years ago
and maybe it can help stem asinine "System Design" interviews like "Design netflix"

I call them asinine because these kinds of architectures aren't written out in one day (let alone 60 minutes) and it's silly to not build an evolving setup (unless you are replacing a legacy system in a company with scale, but then again you have more than 60 minutes...). I wish system design interviews would not just think about if you know how to write high scale designs, but if you know how to build tiny designs that minimize footprint before scale and give maximum flexibility for scaling when its indicated ...

fennecfoxy · 3 years ago
Lmao. _But it does_. At least for the most part.

It's such a previous generation thing to be angry at the way that modern development is done. Of _course_ dev is now web heavy and people are pulling in all sorts of libraries to make their lives easier.

But pretending that those same devs wouldn't also be capable of development on lower/less abstracted levels (well sure, maybe not _all_ of them) is insulting.

zppln · 3 years ago
I'm not sure this particular presentation helps your point though? I sifted through it and if anything I was struck with how simple it seemed? I'm sure there's more to running Netflix though and in my mind they're allowed to have as many engineers as they see fit.
rakoo · 3 years ago
Sure, if you place yourself in an arbitrarily hard problem, it takes a lot to solve it. "How we dug a 100m pit without using machines in 2 days" is an incredible feat, but the constraints only serve those who put them.

Serving large content has been solved for decades already. It's much easier and reliable to serve from multiple sources, each at their maximum speed. Want more speed ? Add another source. Any client can be a source.

Netflix artificially restrains itself by only serving from their machines. It is a very nice engineering feat, but is completely artificial. As a user it feels weird to think of them highly when they could just have gone the easier road.

dmikalova · 3 years ago
This just isn't true though. I worked at a relatively minor video streaming company and we overloaded and took down AWS CloudFront for an entire region. They refused to work with us or increase capacity because the datacenter (one of the ones in APac) was already full. This was on top of already spreading the load across 3 regions. We only had a few million viewers.

We ended up switching to Fastly for CDN. There's something hidden here though that becomes a problem at Netflix size. We were willing to pay the cloud provider tax, and we didn't dig down into kernel level or storage optimizations because off the shelf was good enough. At Netflixes scale, that adds up to millions of extra server hours you have to pay for if you don't do the 5% optimizations outlined in the article.

jedberg · 3 years ago
The constraint is profit. Sure, with unlimited money you can just keep getting more and more servers. But that costs money. It would end up swamping any profit to be made.

By creating this optimized system, it makes serving that much video profitable.

zinclozenge · 3 years ago
How would you do it if you had much more modest scale requirements? Say a few thousand simultaneous viewers. I'm kicking around an idea for a niche content video streaming service, but I don't know much about the tech stacks for it.
geodel · 3 years ago
Seems to be mixing too many things here. Many scaling/ hardware challenges need a lot of people but it can still be true that Netflix has choke full of engineers making half-assed turd Java frameworks day in and day out. I know this because we are forced to use these crappy tools as they are made by Netflix so supposed to be best.

It's just that they succeeded in streaming market with low competition and great success bring in lot of post facto justifications on how outrageously great Netflix tech infra is.

I mean it may be excellent for their purpose but to think their solution can be industry wide replicated seems not true to me.

tankenmate · 3 years ago
Que? You don't seem to have much justification for your points; it seems more like a rant as you have had a bad experience using software provided by Netflix. It would be great if you could provide more details about what was wrong with it rather than just "we are forced to use these crappy tools". I'm genuinely interested.

In my personal experience lots of companies (admittedly all large companies, but many of which sell their services / software / hardware to smaller companies) have a use for serving hundreds of Gbps of static file traffic as cheaply as possible. And the slides for this talk seem exactly on the money (again from my experience slinging lots of static data to lots of users).

paxys · 3 years ago
So Netflix published a framework which seemingly isn't suitable for your use case, your managers forced you to use it, and your response is to blame...Netflix?
Yottr · 3 years ago
That's hilarious. To me it reinforces how few resources you need to run an operation like Netflix. The hardware is brilliantly fast. The more engineering you do, the slower it gets. Engineering is an aristocratic tradition. What you see today is an imitation of that, done badly, which Netflix tries to circumvent.
AtNightWeCode · 3 years ago
Scaling streaming for a company at the size of Netflix is very easy. You can use any edge cache solution, even homemade. The complexity at N seems to stem from other things.
n0tth3dro1ds · 3 years ago
>You can use any edge cache solution

Umm, those solutions exist (from places like AWS and Azure) because Netflix was able to do it without them. The cloud platforms recognized that others would want to build their own streaming services, so they built video streaming offerings.

You have the cart in front of the horse. The out-of-the-box solutions of today don’t exist without Netflix (and YouTube) building a planet scale video solution first.

0x457 · 3 years ago
> any edge cache solution

Someone still has to do the R&D for edge cache? These slides are about Open Connect - their own edge cache solution that gets installed in partners racks (i.e. ISPs and Exchanges). Before things that Netflix and Nginx implemented in FreeBSD, hardware compute power was wasted on various things they discuss in slides.

Yes, you can throw money at the problem and buy more hardware.

yibg · 3 years ago
This is exactly the type of comment OP is referring to. Have you build a steaming service at this scale? Do you actually know what’s involved? Or are you just looking at the surface level, making a bunch of assumptions and reaching a gut feel conclusion?
daper · 3 years ago
I have some experience serving static content and working with CDNs. Here is what I find interesting / unique here:

- They are not using OS page cache or any memory caching for that, every request is served directly from disks. This seems possible only when requests are spread between may NVMe disks since single high-end NVMe like Micron 9300 PRO has max 3.5GB/s read speed (or 28Gbps) - far less than 800Gbps. Looks like it works ok for long-tail content but what about new hot content everybody wants to watch at the day of release? Do they spread the same content over multiple disks for this purpose?

- Async I/O resolves issues with nginx process stalling because of disk read operation but only after you've already opened the file. Depending on FS / number of files / other FS activities, directory structure opening the file can block for significant time and there is no async open() AFAIK. How they resolve that? Are we assuming i-node cache contains all i-nodes and open() time is insignificant? Or are they configuring nginx() with large open file cache?

- TLS for streamed media was necessary because browsers started to complain about non-TLS content. But that makes things sooo complicated as we see in the presentation (kTLS is 50% of CPU usage before moving to encryption offloaded by NIC). One has to remember that the content is most probably already encrypted (DRM), we just add another layer of encryption / authentication. TLS for media segments make so little sens IMO.

- When you relay on encryption or TCP offloading by NIC you are stuck with that is possible with your NIC. I guess no HTTP/3 over UDP or fancy congestion control optimization in TCP until the vendor somehow implements it in the hardware.

drewg123 · 3 years ago
Responding to a few points. We do indeed use the OS page cache. The hottest files remain in cache and are not served from disk. We manage what is cached in the page cache and what is directly released using the SF_NOCACHE flag.

I believe our TLS initiative was started before browsers started to complain, and was done to protect our customer's privacy.

We have lots of fancy congestion optimizations in TCP. We offload TLS to the NIC, *NOT* TCP.

daper · 3 years ago
Can I ask if your whole content can be stored on a single server so content is simply replicated everywhere or there is some layer above that that directs requests to the specific group of servers storing the requested content? I assume the described machine is not just part of tiered cache setup since I don't think nginx capable for complex caching scenarios.
arkj · 3 years ago
> We offload TLS to the NIC, NOT TCP.

How is this possible? If TCP is done on the host and TLS on the NIC data will need to pass through the CPU right? But the slides show cpu fully bypassed for data

mgerdts · 3 years ago
A Micron 9300 Pro is getting rather long in the tooth. They are using PCIe gen 4 drives that are twice as fast as the Micron 9300.

My own testing on single socket systems that look rather similar to the ones they are using suggests it is much easier to push many 100 Gbit interfaces to their maximum throughput without caching. If your working set fits in cache, that may be different. If you have a legit need for sixteen 14 TiB (15.36 TB) drives, you won't be able to fit that amount of RAM into the system. (Edit: I saw a response saying they do use the cache for the most popular content. They seem to explicitly choose what goes into cache, not allowing a bunch of random stuff to keep knocking the most important content out of cache. That makes perfect sense and is not inconsistent with my assertion that hoping a half TiB cache will do the right thing with 224 TiB of content.)

TLS is probably also to keep the cable company from snooping on the Netflix traffic, which would allow the cable company to more effectively market rival products and services. If there's a vulnerability in the decoders of encrypted media formats, putting the content in TLS prevents a MITM from exploiting that.

From the slides, you will see that they started working with Mellanox on this in 2016 and got the first capable hardware in 2020, with iterations since then. Maybe they see value in the engineering relationship to get the HW acceleration that they value into the hardware components they buy.

Disclaimer: I work for NVIDIA who bought Mellanox a while back. I have no inside knowledge of the NVIDIA/Netflix relationship.

ShroudedNight · 3 years ago
Just from reading the specs (I.E. real world details might derail all of this):

https://www.freebsd.org/cgi/man.cgi?query=sendfile&sektion=2

Given one can specify arbitrary offsets for sendfile(), it's not clear to me that there must be any kind of O(k > 1) relationship between open() and sendfile() calls: As long as you can map requested content to a sub-interval of a file, you can co-mingle the catalogue into an arbitrarily small number of files, or potentially even stream directly off raw block devices.

eru · 3 years ago
Does the encryption in DRM protect the metadata?
daper · 3 years ago
AFAIK no. The point of DRM is to prevent recording / playing the media on a device without decryption key (authorization). So the goal is different than TLS that is used by the client to ensure the content is authentic, unaltered during transmission and not readable by a man-in-the-middle.

But do we really need such protection for a TV show?

"Metadata" in HLS / DASH is a separate HTTP request which can be served over HTTPS if you wish. Then it can refer to media segments served over HTTP (unless your browser / client doesn't like "mixed content").

nextgens · 3 years ago
No, and it doesn't protect the privacy of the viewer either!
Moral_ · 3 years ago
A lot of the reasons they've had to build most of this stuff themselfs is because they decided for some reason to use freeBSD.

The NUMA work they did, I remember being in a meeting with them as a Linux Developer at Intel at the time. They bought NVMe drives or were saying they were going to buy NVMe drives from Intel which got them access to "on the ground" kernel developers and CPU people from Intel. Instead of talking about NVMe they spent the entire meeting asking us about howt the Linux kernel handles NUMA and corner cases around memory and scheudling. If I recall correctly I think they asked if we could help them upstream BSD code for NVMe and NUMA. I think in that meeting there was even some L9 or super high up NUMA CPU guy from Hillsborough they some how convinced to join.

The conversation and technical discussion was quite fun, but it was sort of funny to us at the time they were having to do all this work on the BSD kernel that was solved years ago for linux.

Technical debt I guess.

cperciva · 3 years ago
Netflix tried Linux. FreeBSD worked better.
Thaxll · 3 years ago
It's hard to believe in 2022, Google, Amazon, FB etc .. all use Linux, all CDN use Linux as well, and some services serve even more traffic than Netflix ( Youtube ). BSD faster than Linux is a myth, the fact that 99% of those run on Linux means more people worked on those problems means it's most likely always faster.

The funny thing is the rest of Netflix runs on Ubuntu, only those edge CDN runs on BSD.

throw0101c · 3 years ago
*At the time when they created the OCA project.

If someone was going to do a similar comparison now the results could be different.

dboreham · 3 years ago
By some definition of better.
jeffbee · 3 years ago
I still don't get the NUMA obsession here. It seems like they could have saved a lot of effort and a huge number of powerpoint slides by building a box with half of these resources and no NUMA: one CPU socket with all the memory and one PCIe root complex and all the disks and NICs attached thereto. It would be half the size, draw half the power, and be way easier to program.
drewg123 · 3 years ago
This is a testbed to see what breaks at higher speed. Our normal production platforms are indeed single socket and run at 1/2 this speed. I've identified all kinds of unexpected bottlenecks on this testbed, so it has been worth it.

We invested in NUMA back when Intel was the only game in town, and they refused to give enough IO and memory bandwidth per-socket to scale to 200Gb/s. Then AMD EPYC came along. And even though Naples was single-socket, you had to treat it as NUMA to get performance out of it. With Rome and Milan, you can run them in 1NPS mode and still get good performance, so NUMA is used mainly for forward looking performance testbeds.

jiggawatts · 3 years ago
Modern CPUs like the AMD EPYC server processor are "always NUMA", even in single-socket configurations!

They have 9 chips on what is essentially a tiny, high-density motherboard. Effectively they are 8-socket server boards that fit in the palm of your hand.

The dual-socket version is effectively a 16-socket motherboard with a complex topology configured in a hierarchy.

Take a look at some "core-to-core" latency diagrams. They're quite complex because of the various paths possible: https://www.anandtech.com/show/16214/amd-zen-3-ryzen-deep-di...

Intel is not immune from this either.Their higher core-count server processors have two internal ring-bus networks, with some cores "closer" to PCIe devices or certain memory buses: https://semiaccurate.com/2017/06/15/intel-talks-skylakes-mes...

Bluecobra · 3 years ago
If you are buying servers at scale the costs will certainly add up vs. buying two processors. If you buy single proc servers, that is double the amount of chassis, rail kits, power supplies, power cables, drives, iLO/iDRAC licenses, etc.
muststopmyths · 3 years ago
Can you buy non NUMA mainstream CPUs though? Honest question because I’d love to be rid of that BS too
ksec · 3 years ago
Is NUMA a solved issue on Linux? Correct me if I am wrong but I was under the impression it may be better handled under certain conditions, but NUMA, the problem in itself is hardly solved.
alberth · 3 years ago
Maybe Brendan Gregg can further enlighten his new coworkers at Intel why Netflix chose both AMD & FreeBSD.
trunnell · 3 years ago
The OpenConnect team at Netflix is truly amazing and lots of fun to work with. My team at Netflix partnered closely with them for many years.

Incidentally, I saw some of their job posts yesterday. If you think this presentation was cool, and you want to work with some competent yet humble colleagues, check these out:

CDN Site Reliability Engineer https://jobs.netflix.com/jobs/223403454

Senior Software Engineer - Low Latency Transport Design https://jobs.netflix.com/jobs/196504134

The client side team is hiring, too! (This is my old team.) Again, it's full of amazing people, fascinating problems, and huge impact:

Senior Software Engineer, Streaming Algorithms https://jobs.netflix.com/jobs/224538050

That last job post has a link to another very deep-dive tech talk showing the client side perspective.

ksec · 3 years ago
>Senior Software Engineer - Low Latency Transport Design

I am not a Netflix subscriber but I dont think Netflix does live streaming for anything much if at all.

The Ultra Low Latency seems to suggest Netflix is exploring this idea. Which could be Live Sport or some other shows.

NikhilVerma · 3 years ago
I am curious why they manually split the video to compress individual clips with different bit rates. Encoders usually have a variable-bit-rate option that does the same?
trunnell · 3 years ago
That's the essence of adaptive streaming - the player continuously selects a video bitrate that can be downloaded quickly enough given the network conditions. That requires multiple video bitrates being available, and each one must be encoded such that the player can switch between them at known cut points.
totony · 3 years ago
I see netflix has an office in Toronto, but the jobs are "remote, US". Any idea if remote is also an option for canadians?
frankjr · 3 years ago
antonio-ramadas · 3 years ago
I found the same video on the website of the summit: https://nabstreamingsummit.com/videos/2022vegas/

I’m on mobile and there does not seem to exist a direct link. Search for: “Case Study: Serving Netflix Video Traffic at 400Gb/s and Beyond”

ndom91 · 3 years ago
Video of this presentation available here: https://cdnapisec.kaltura.com/index.php/extwidget/preview/pa...
forgot_old_user · 3 years ago
thank you!
ksec · 3 years ago
And this is still on ConnectX-6 Dx, with PCI-Gen 5 and ConnectX-7, Netflix should be able to push for 1.6Tbps per box. This will hopefully keep drewg123 and his team busy for another year :P
dragontamer · 3 years ago
At that point, RAM itself would likely be the bottleneck.

But maybe DDR5 will come out by then and get this team busy again lol.

wmf · 3 years ago
Genoa does indeed have roughly double the memory bandwidth.
pclmulqdq · 3 years ago
This is amazing work from the Netflix team. I'm looking forward to 1.6 Tb/s in 4 years.

It is interesting that this work is happening on FreeBSD, and potentially with diverging implementations than Linux. Linux programs seem to be moving towards userspace getting more power, with things like io_uring and increasing use of frameworks like DPDK/SPDK. This work is all about getting userspace out of the way, with things like async sendfile and kernel TLS. That's pretty neat!

the8472 · 3 years ago
kTLS has been added to linux too including offload. It also has p2p-dma, so in principle you can shovel the file directly from NVMe to the NIC and have the NIC encrypt it, so it'll never touch the CPU or main memory. But that only works on specific hardware.
robocat · 3 years ago
Memory is the cache for popular content. You couldn’t serve fast enough directly from NVMe.

“~200GB/sec of memory bandwidth is needed to serve 800Gb/s” and “16x Intel Gen4 x4 14TB NVME”. So each NVMe drive would need to serve 12.5GB/s which is more than the 8GB/s limit for PCIe 4.0 x4. Also popular content would need to be on every drive, drastically lowering the total content stored.

Also see drewg’s comment on this for a different reason: https://news.ycombinator.com/item?id=32523509

Deleted Comment

mgerdts · 3 years ago
PCIe Gen 5 drives look poised for wide availability next year and NVIDIA has been demoing CX7 [1] which is also PCIe Gen 5. Intel already has some Gen 5 chips and AMD looks like they will follow soon [2]. Surely there will be other bumps, but I bet they pull it off in way less than 4 years.

1. https://www.servethehome.com/nvidia-connectx-7-shown-at-isc-...

2. https://wccftech.com/amd-epyc-7004-genoa-32-zen-4-core-cpu-s...