I worked at Twitch from 2014 to 2018. I was never on the video team, but here are some details that have changed.
Video:
- Everything is migrated from RTMP to HLS; RTMP couldn't scale across several consumer platforms
- This added massive delay (~30+s) early on, the video team has been crushing it getting this back down to the now sub-2s
- Flash is dead (now HTML5)
- F5 storms ("flash crowds") are still a concern teams design around; e.g. 900k people hitting F5 after a stream blips offline due to the venue's connection
- afaik Usher is still alive and well, in much better health today
- Most teams are on AWS now; video was the holdout for a while because they needed specialized GPUs.
EDIT: "This isn't quite right; it has more to do with the tight coupling of the video system with the network (eg, all the peering stuff described in the article)" -spenczar5
- Realtime transcoding is a really interesting architecture nowadays (I am not qualified to explain it)
Web:
- No more Ruby on Rails because no good way was found to scale it organizationally; almost everything is now Go microservices back + React front
- No more Twice
- Data layer was split up to per-team; some use PostgreSQL, some DynamoDB, etc.
- Of course many more than 2 software teams now :P
Hi glacials :) Small correction from someone at Twitch today:
> video was the holdout for a while because they needed specialized GPUs
This isn't quite right; it has more to do with the tight coupling of the video system with the network (eg, all the peering stuff described in the article).
Yeah, and the problem is that it often does work to fix issues. It's the web equivalent of "have you tried turning it off and turning it back on again?"...
> - No more Ruby on Rails because no good way was found to scale it organizationally; almost everything is now Go microservices back + React front
Ugh, I just... I keep trying to pretend I don't need to learn Go, but every highly scalable system I read about that's recently been written about seems to be using it. Maybe I just need to stay away from systems that need to scale? Heh...
Technically speaking you can build scalable systems using anything you want. But if you need to hire a couple of hundreds developers, you better go with Java 7 or Go than Ruby, Lisp or Perl. The dumber and more uniform the better.
Personally, I think it’s hugely worth learning. Aside from some eschewed defacto behaviors, Go is very easy to pick up and learn the entirety of in a week or two, because the language itself is really not that large. So I’d argue the time investment is a good one for what you get.
Still, you definitely do not need Go to scale systems. People scale Everything, perhaps most impressively PHP applications.
Go isn't the only language that scales, it just happens to be popular amongst the scripting language crowd as a next step. You're by no means limited in your choice. You could do Java, C#, Rust...
Before golang was a thing, there were highly scalable systems that handled way more traffic than anything written in golang today. Those systems were (and are) written in languages like C++ and Java and C#.
You're just seeing golang in articles because of hype.
I'm curious: Do services like Twitch specify a specific desired codec/bitrate that doesn't get transcoded? Transcoding seems like a lot of effort for lower quality end result.
If I were streaming, I would want to avoid transcoding as much as possible. Since we're talking about live broadcasting, there is a unique ability for the streamer to choose the format they upload.
In the RTMP days, the highest quality setting in the viewer was always a straight pass through from the broadcaster, and the reduced versions were transcoded in the data center to fit down lower-bandwidth last-mike pipes.
Excuse the simple question: When I hear "microservices", I think serverless backend. Is that right, or are they different? If they're the same, how do you stream video with serverless? (Seems like streaming, websockets, etc... shouldn't be possible in a serverless environment...)
"Microservice" describes the size and scope of each deployment artifact. It answers the question "is the whole system just one big ball, is it broken up, how broken up is it?" It doesn't describe how it is deployed.
"Serverless" describes how a deployment artifact is deployed and runs. Generally it refers to a class of technologies in multiple domains whereby intricate knowledge of the underlying host is abstracted behind a cleaner API, with things like scaling, security, patching, etc handled by an infrastructure provider. While the term rose in prominence alongside "functions as a service", which is certainly a technology that generally qualifies as serverless, there are many serverless products out there: AWS Fargate for running containers, DynamoDB for a database, S3 for object storage, all of these are "serverless". A good signal is: if I can SSH into it, its not serverless.
A microservice can certainly be deployed serverless (ECS/Fargate or Google Cloud Run comes to mind). A microservice can even refer to one or more logically related functions-as-a-service; the term more-so speaks to how the engineering teams organize their business domain into the code and how the APIs speak to each other, rather than the exact underlying technologies.
Microservices are about splitting code into different servers instead of a monolithic codebase. You end up with different servers (probably virtualized) for each domain of the application.
Like, instead of having the video decoding and the analytics code in the same monolith attached to same DB, you deploy a different server for each one, generally with a new DB for each. When the services need to talk to each other, they do it via network (REST, gRPC, etc.).
They're different. Microservices are still stateful applications that run 24/7. They are just really small in scope.
e.g. the Friends feature on Twitch is one microservice, running in its own autoscaling group, with internal APIs used by other microservices like Whispers.
My team follows microservice patterns, and have deployed services that utilise websockets over both serverless (Azure Functions and Lambda) as well as regular hosted services (on k8s, EC2 and Azure App service etc). Nothing stopping you there. On the streaming video side we did an app that used Azure Media services + Azure functions.. works well enough.
Not necessarily a good idea, but one 'feature' of microservices is the ability to pick different stacks, languages and delivery methods on an individual service level.
I work at twitch. Let me put it this way. My team that I am on (VOD) has ~8 backend engineers and we are in charge of something like ~2 dozen services.
We literally have services that are run entirely using AWS Lambda functions only.
This is a pretty big difference from teams I've worked on in the past, that have 8 engineers all working on a singular service.
"Microservices" is more of a philosophy than anything.
No. Elemental is more of a high end encoding system for quality. Twitch is more about bulk cheap transcodes of good quality. Think about it. MLB has maybe 18 concurrent events. Twitch is running minimum in the 10k range.
No we never had Elementals. In the early days there was no way we could afford them. In the later days I don't think we would want them as we needed to scale so many transcode jobs that it was easier to have a large farm of dumb machines to organise jobs across.
There may have been an element machine at one point that was used for testing/playing but I really don't think so, and know there wasn't one between 2010 and 2017.
"F5 storms" are easy to handle. Intercept all keypress combinations for refresh and do what you want with it client side. (spread it out over time, use a high-performance endpoint to check if live or a combination)
Most people doesn't use the refresh button in the browser, so only a small amount of traffic will be uncontrolled.
Do you have any data to support that? I personally don't have an F5 key on my keyboard (it requires pressing a modifier), so I pretty much always click the reload button to fix a stream blip. The impression I get from reading Twitch chat is that most people are using mobile. I doubt they have a keyboard plugged in and press F5 to refresh.
That said, you certainly don't need your video streaming servers to handle those hundred-thousand refresh requests.
I can barely follow along with this, it's very technical. I can't imagine how Kyle Vogt acquired the necessary knowledge to make this work. Example:
> The point of having multiple datacenters is not for redundancy, it's to be as close as possible to all the major peering exchanges. They picked the best locations in the country so they would have access to the largest number of peers.
This is the kind of thing where I would have to hire some kind of network engineering expert, and he just figured this stuff out and made it work? I can't fathom other people's intelligence sometime.
He leveraged the YCombinator network to absorb a lot of information quickly. For example, I taught him basic networking (routers, switches, multicast/anycast, AS numbers, etc). I shared my 10 years of knowledge with him in a single two hour session because he's a genius, and then he ran from there, vastly exceeding my knowledge. I was there because Emmet asked Steve and Steve asked me to go over, and I was happy to help. I'm sure I wasn't the only one.
Like other sibling, I would be very much interested in a talk like this. Teaching networking with real world examples and explaining 2-3 large scale architectures w.r.t networking. Maybe a long video(or small series) and a follow-up on twitch for Q/A. Would even pay for this.
> I can’t imagine how Kyle Vogt acquired the necessary knowledge to make this work.
I've worked on projects with Kyle, and he often goes into bulldozer mode. It is no surprise to me that Kyle could "learn" all he needed in order to get something like this set up (or at least learn enough to orchestrate a small group in constructing it). Kyle is, by all means, a "force of nature" as YC tends to define it.
The downside to Kyle's optimism is that he often has very little concern for the humanity of others. He can set up decent optics around his actions and decisions in the wake of what many might consider failures, but he has consistently abused those who try to give him good-faith constructive feedback and often brought co-workers to tears. This is all well-documented at least through the past 4-5 years. (Kyle does actually explicitly ask for "direct" feedback btw. He just is only capable of handing the feedback on a periodic, weekly or monthly basis).
A key lesson of this article (and in glacials post above) is a testament to what can be achieved very quickly if technical debt is of minor concern. Kyle's key strength is in building a proof of concept that supports rapid iteration. This point appears to be something the Justin / Twitch teams did very well.
A secondly lesson is in getting alignment among diverse engineers. Think about how the team might have debated the architecture presented. Think about how some of the choices might rub people the wrong way.
Finally, Kyle is a unique character in several ways but is not alone in possessing a transient "bulldozer" mentality. If you see yourself having the same pattern of behavior, get help before others get hurt. There are a variety of mitigations that can help, but they need explicit participation.
> I can’t imagine how Kyle Vogt acquired the necessary knowledge to make this work.
By this point in history, it wasn’t just him anymore and we’d done a few rounds of improvements already out of necessity. As I recall, he got us up and running at PAIX based mostly on research, but most of the other data centers were built out by a network engineer(1) we hired away from YouTube.
While he was working on the network engineering and keeping the original system afloat, I did a lot of the software work for the system described here.
Don't be so hard on yourself, it's pretty common to read blog posts like this and come away with the idea that a super smart person took one look at the lay of the land and leapt directly from problem -> solution in one neat step. What you don't see is the people the talked to about the problem, their back-and-forth spitballing ideas, the various googling to see if there's a standard approach ... and most importantly you don't see any failed attempts.
Bear in mind that I don't think this is some deliberate attempt to appear superhuman, I think it's just accidental
I'm sure if you tried building your own livestreaming or VOD service, you'd come up with similar solutions and insights. Peering problems are fairly obvious - put up a gigabit server in Germany and try livestreaming high bitrate video to a highspeed connection in San Francisco (or vice-versa), and watch as you run into problems despite having more than enough theoretical bandwidth.
When your users start to complain, you tend to develop the domain knowledge necessary to solve their problems pretty quickly.
Pretty exceptional indeed. Also impressive that he was able to grow from founder-stage tech to that scale, since they're largely different problems.
Especially back in 2010. I feel like I'd have a much better shot of being able to figure out that scale these days than a decade ago. (If I spent my free time studying and not watching Age of Empires 2 on Justin.tv/Twitch).
If that was actually the case, why the f did they have a boatload of gear in 200 Paul? There's almost no peering exchange there whatsoever (until SFMIX about 3 years ago). Can think of a lot better connected places in the Bay.
It’s one of the reasons we moved out of there. Moving day was an ... interesting experience: lots of planning to minimize downtime, and everything that was actually planned went relatively well. Unfortunately, what we thought was a 90% plan turned out to be more like 50%. Several people pulled all-nighters on that one.
At the time PAIX had a reverse-billing setup: the more data you transferred, the cheaper your connection charge was; we managed to get all the way into the cheapest billing tier within the first billing cycle which was basically unheard-of at the time.
Building this was really fun, and I’m very proud of what kd5bjo, Emmett, and many others did to help turn Justin.tv/Twitch into what it is today. We found a way to make what was fundamentally an unprofitable business (if you relied on CDNs) work by relentlessly focusing on reducing cost to the absolute bare minimum through good technology choices and innovating when necessary. Justin.tv would have died otherwise.
I was the primary architect for Usher and the server-side of the video system described here. I’m happy to answer any questions, assuming I still remember the answers.
I don’t get your reference, but we chose the name Usher because it’s the software equivalent of the person at the theater who looks at your ticket and shows you where your seat is— it doesn’t actually handle any of the video data, it just knows where you should go to find it.
Slightly off topic but this is my favorite justin.tv video, and also shows something of the website and how early in the days of live streaming we were back then: https://www.youtube.com/watch?v=BqgEm8XWXu8
"Live video can't be made by pushing video faster, it takes a completely differently architecture."
Funnily enough, that's pretty much how HLS, the modern live video standard, works - it's essentially a series of tiny video clips loaded & played one right after the other, distributed through the same CDN as normal video files.
Thanks to HLS, live video is actually much worse than it was 10 years ago with RTMP in terms of latency. There's been some recent efforts in getting it down, although they're generally not standardised, hard to scale (e.g. WebRTC) and or a bit awkward.
I don't think anyone really follows Apple's spec for various technical reasons, though. Most do some sort of chunked-transfer encoding, along with pre-signaling segments in playlists, as outlined by the Periscope folks here: https://medium.com/@periscopecode/introducing-lhls-media-str...
None follow apples spec because it’s a month old. It’s not for technical reasons. Like it or not (TBH I’m in the fence about it), it WILL be the standard.
kind of? i think this is more about live replication infrastructure than the video carrier itself. with youtube, you can set up some CDNs, but with livestreaming you need to be continually ingesting and spitting out content at the same time to lots of places in the world at once
The video carrier can have a lot of effect over the replication properties, though. HLS is essentially a playlist of video URLs that the client fetches and stitches together, as well as refreshing the playlist to get the names of new chunks. Without an extremely specialized web server, each chink needs to be complete and published before adding it to the playlist, putting a lower bound on the overall latency of the chunk size.
RTMP, on the other hand, maintains a live socket between the server and client, and the server can forward each packet as it becomes available.
> They also don't have chats with the 100,000 people watching a channel. What they do is assign people into rooms of 200 people each so you can have a meaningful interaction in a smaller group. This also helps with scaling. I thought this was a pretty clever strategy.
Just curious if this is still a thing. I've watched an unhealthy amount of Twitch (not all with chat open) and never noticed this.
I'm sure it's not. Often, the streamer is reading the chat onscreen while streaming, and it's always identical to the one I'm seeing, except perhaps delayed by some seconds.
Maybe twitch creates an illusion by showing the streamer only a subset of people that is shared among all other groups. This allows everyone to see what the streamer sees and to communicate within their group.
Video:
- Everything is migrated from RTMP to HLS; RTMP couldn't scale across several consumer platforms
- This added massive delay (~30+s) early on, the video team has been crushing it getting this back down to the now sub-2s
- Flash is dead (now HTML5)
- F5 storms ("flash crowds") are still a concern teams design around; e.g. 900k people hitting F5 after a stream blips offline due to the venue's connection
- afaik Usher is still alive and well, in much better health today
- Most teams are on AWS now; video was the holdout for a while because they needed specialized GPUs. EDIT: "This isn't quite right; it has more to do with the tight coupling of the video system with the network (eg, all the peering stuff described in the article)" -spenczar5
- Realtime transcoding is a really interesting architecture nowadays (I am not qualified to explain it)
Web:
- No more Ruby on Rails because no good way was found to scale it organizationally; almost everything is now Go microservices back + React front
- No more Twice
- Data layer was split up to per-team; some use PostgreSQL, some DynamoDB, etc.
- Of course many more than 2 software teams now :P
- Chat went through a major scaling overhaul during/after Twitch Plays Pokemon. John Rizzo has a great talk about it here: https://www.twitch.tv/videos/92636123?t=03h13m46s
Twitch was a great place to spend 5 years at. Would do again.
> video was the holdout for a while because they needed specialized GPUs
This isn't quite right; it has more to do with the tight coupling of the video system with the network (eg, all the peering stuff described in the article).
Deleted Comment
Ugh, I just... I keep trying to pretend I don't need to learn Go, but every highly scalable system I read about that's recently been written about seems to be using it. Maybe I just need to stay away from systems that need to scale? Heh...
Technically speaking you can build scalable systems using anything you want. But if you need to hire a couple of hundreds developers, you better go with Java 7 or Go than Ruby, Lisp or Perl. The dumber and more uniform the better.
Still, you definitely do not need Go to scale systems. People scale Everything, perhaps most impressively PHP applications.
You're just seeing golang in articles because of hype.
I'm curious: Do services like Twitch specify a specific desired codec/bitrate that doesn't get transcoded? Transcoding seems like a lot of effort for lower quality end result.
If I were streaming, I would want to avoid transcoding as much as possible. Since we're talking about live broadcasting, there is a unique ability for the streamer to choose the format they upload.
Excuse the simple question: When I hear "microservices", I think serverless backend. Is that right, or are they different? If they're the same, how do you stream video with serverless? (Seems like streaming, websockets, etc... shouldn't be possible in a serverless environment...)
"Serverless" describes how a deployment artifact is deployed and runs. Generally it refers to a class of technologies in multiple domains whereby intricate knowledge of the underlying host is abstracted behind a cleaner API, with things like scaling, security, patching, etc handled by an infrastructure provider. While the term rose in prominence alongside "functions as a service", which is certainly a technology that generally qualifies as serverless, there are many serverless products out there: AWS Fargate for running containers, DynamoDB for a database, S3 for object storage, all of these are "serverless". A good signal is: if I can SSH into it, its not serverless.
A microservice can certainly be deployed serverless (ECS/Fargate or Google Cloud Run comes to mind). A microservice can even refer to one or more logically related functions-as-a-service; the term more-so speaks to how the engineering teams organize their business domain into the code and how the APIs speak to each other, rather than the exact underlying technologies.
Like, instead of having the video decoding and the analytics code in the same monolith attached to same DB, you deploy a different server for each one, generally with a new DB for each. When the services need to talk to each other, they do it via network (REST, gRPC, etc.).
e.g. the Friends feature on Twitch is one microservice, running in its own autoscaling group, with internal APIs used by other microservices like Whispers.
Not necessarily a good idea, but one 'feature' of microservices is the ability to pick different stacks, languages and delivery methods on an individual service level.
We literally have services that are run entirely using AWS Lambda functions only.
This is a pretty big difference from teams I've worked on in the past, that have 8 engineers all working on a singular service.
"Microservices" is more of a philosophy than anything.
There may have been an element machine at one point that was used for testing/playing but I really don't think so, and know there wasn't one between 2010 and 2017.
Most people doesn't use the refresh button in the browser, so only a small amount of traffic will be uncontrolled.
That said, you certainly don't need your video streaming servers to handle those hundred-thousand refresh requests.
> The point of having multiple datacenters is not for redundancy, it's to be as close as possible to all the major peering exchanges. They picked the best locations in the country so they would have access to the largest number of peers.
This is the kind of thing where I would have to hire some kind of network engineering expert, and he just figured this stuff out and made it work? I can't fathom other people's intelligence sometime.
I've worked on projects with Kyle, and he often goes into bulldozer mode. It is no surprise to me that Kyle could "learn" all he needed in order to get something like this set up (or at least learn enough to orchestrate a small group in constructing it). Kyle is, by all means, a "force of nature" as YC tends to define it.
The downside to Kyle's optimism is that he often has very little concern for the humanity of others. He can set up decent optics around his actions and decisions in the wake of what many might consider failures, but he has consistently abused those who try to give him good-faith constructive feedback and often brought co-workers to tears. This is all well-documented at least through the past 4-5 years. (Kyle does actually explicitly ask for "direct" feedback btw. He just is only capable of handing the feedback on a periodic, weekly or monthly basis).
A key lesson of this article (and in glacials post above) is a testament to what can be achieved very quickly if technical debt is of minor concern. Kyle's key strength is in building a proof of concept that supports rapid iteration. This point appears to be something the Justin / Twitch teams did very well.
A secondly lesson is in getting alignment among diverse engineers. Think about how the team might have debated the architecture presented. Think about how some of the choices might rub people the wrong way.
Finally, Kyle is a unique character in several ways but is not alone in possessing a transient "bulldozer" mentality. If you see yourself having the same pattern of behavior, get help before others get hurt. There are a variety of mitigations that can help, but they need explicit participation.
By this point in history, it wasn’t just him anymore and we’d done a few rounds of improvements already out of necessity. As I recall, he got us up and running at PAIX based mostly on research, but most of the other data centers were built out by a network engineer(1) we hired away from YouTube.
While he was working on the network engineering and keeping the original system afloat, I did a lot of the software work for the system described here.
(1) Name withheld out of courtesy
Bear in mind that I don't think this is some deliberate attempt to appear superhuman, I think it's just accidental
When your users start to complain, you tend to develop the domain knowledge necessary to solve their problems pretty quickly.
Especially back in 2010. I feel like I'd have a much better shot of being able to figure out that scale these days than a decade ago. (If I spent my free time studying and not watching Age of Empires 2 on Justin.tv/Twitch).
At the time PAIX had a reverse-billing setup: the more data you transferred, the cheaper your connection charge was; we managed to get all the way into the cheapest billing tier within the first billing cycle which was basically unheard-of at the time.
Funnily enough, that's pretty much how HLS, the modern live video standard, works - it's essentially a series of tiny video clips loaded & played one right after the other, distributed through the same CDN as normal video files.
Thanks to HLS, live video is actually much worse than it was 10 years ago with RTMP in terms of latency. There's been some recent efforts in getting it down, although they're generally not standardised, hard to scale (e.g. WebRTC) and or a bit awkward.
I don't think anyone really follows Apple's spec for various technical reasons, though. Most do some sort of chunked-transfer encoding, along with pre-signaling segments in playlists, as outlined by the Periscope folks here: https://medium.com/@periscopecode/introducing-lhls-media-str...
I would have assumed it's because Apple only announced this 4 weeks ago, and the only clients that support it are beta software.
RTMP, on the other hand, maintains a live socket between the server and client, and the server can forward each packet as it becomes available.
Just curious if this is still a thing. I've watched an unhealthy amount of Twitch (not all with chat open) and never noticed this.