Readit News logoReadit News
jloveless · 7 years ago
This is a nicely written article, however it's worth noting that the performance/reliability/availability differences across CDNs at a particular moment in time are pretty much non existent. These providers share the same backbone networks, same IX PoPs etc and thereby have little diversification benefit. See https://blog.edgemesh.com/understanding-diversification-netw... )

Where multi-CDN really shines is helping with regional specific solutions (e.g. China , India, Brazil, Argentina etc). It's probably worth nothing that the team at Streamroot helps do this client side and their p2p style option helps localize traffic as well. The former is certainly the way to go and the latter really helps add network level diversification. Of course - I'm biased as we offer similar lower level solutions.

Jormundir · 7 years ago
This statement is not even close to true. At a particular point in time, different CDNs can have very different performance even for the same ISP in the same region.
joshenders · 7 years ago
Absolutely. If you were a CDNs only customer this may be true but the reality is that you’re not and they are always going to be over-subscribed. Having worked at a CDN provider (Cloudflare), I can tell you that they are constantly battling resource contention via DDoS or other reliability issues.

Multi-CDN is the way to go for performance and availability, though as a customer it can be challenging because you’re forced to limit your configuration to the lowest common denominator of features and there’s not a great way to test consistency of your configurations across all vendors.

This article is essentially a high level sales pitch though; I didn’t find it all that useful. I implemented multi-CDN at Pinterest using Cedexis (DNS based), though with modern DNS providers like NSOne, Cloudflare, Dynect, a modern spark-based ETL pipeline, and the browser navigation timing API (RUM), it wouldn’t be too challenging to build something resembling Cedexis yourself.

Deleted Comment

eldod · 7 years ago
Have to disagree here, you can have very different performances from different CDNs for the same user at the same time. Some CDNs have their own backbone (or at least partially), a lot of them are using different routes, and a lot of the time, the issue is not with the backbone, but with peering inter-connections, which can be different between each ISP & CDN. And a CDN's capacity is shared between all its customers, so if you get a huge peak from one of them, it can impact the others too. Old but good example: Before Apple started building out its own CDN, it was using the leading commercial one, and when Apple was doing its iOS/MacOS updates, other broadcasters were having big troubles delivering their streams at the same time because the CDN was overloaded - but it doesn't mean that other CDNs were also all down. That's also why most video broadcasters are now doing multi-CDN for their biggest live events like Superbowl or the world cup - to be able to distribute the load on several networks.
jloveless · 7 years ago
In this case - given this and the other comments - I stand corrected! I would love to see some data on examples of this occurring in the wild - and it 100% makes sense that a congested CDN provider would impact neighbors. It would be great if someone could do a writeup on examples and the detection/mitigation strategies. Perhaps issues like these (alongside cost) are driving the DIY CDN adoptions (Apple being the exemplar but also Tesla etc)? Also the Pinterest example is a great real world example - and they do an awesome job especially given the size of that cache - so there must be so real value from a performance standpoint! Out of curiosity does it seem like these dynamic switching decisions are better at the server level or client level?

Deleted Comment

SilasX · 7 years ago
Shower thought: what if html/http/browsers supported, as a primitive, the concept of "fetch this asset from url A, or if that doesn't work, B, or if that doesn't work, C ..."?
douglasfshearer · 7 years ago
If video is being served via HLS (which it probably is in 2018), then the manifests support redundant streams, where multiple hosts can be specified for each stream. [0]

hls.js supports this, as do many other clients. IME it works nicely for providing some client-side switching in case one of your hosts/CDNs goes down.

[0] https://developer.apple.com/library/archive/documentation/Ne...

eldod · 7 years ago
Both HLS and DASH support redundant streams (by adding a redundant variant URL in the HLS playlist and multiple BaseURLs in the DASH manifest). It's indeed the simplest way to have the easiest way to have a fallback client-side. If you use it, you should make sure that the player supports it, and that the retry mechanisms are rightly configured (like for instance all the MaxRetry config params in hls.js: https://github.com/video-dev/hls.js/blob/master/docs/API.md#... )
esotericn · 7 years ago
"If that doesn't work" isn't the problem.

As a silly limiting example, imagine that you host Netflix on your dial-up connection as url A.

It works.

Oh, okay, right, let's set a timeout then, if it takes more than 1 second to load, we try url B.

That works, but now we've got a 1 second delay on everything. Okay, we'll update the default to be url B.

Conditions are changing all the time as a result of bottlenecks in the infrastructure moving about.

What I think you'd actually need to do is something like this - initially, fetch from multiple endpoints simultaneously with an early-cancel (so you don't waste bandwidth on the slower ones).

For N seconds you just use the fastest one (perhaps with an 'if it doesn't work' mechanism, sure).

Every N seconds you re-evaluate the fastest endpoint using the multi-fetch.

And so on and so forth.

There are better algorithms, this is back of the envelope stuff.

howdjisjje · 7 years ago
You solution to bandwidth congestion is for everyone to use 3x+ more bandwidth than they need?

Deleted Comment

adrianmonk · 7 years ago
I think that could be very beneficial. If it were a built in feature for HTTP (or more broadly level, maybe TCP/IP), it would not only save people the hassle of reinventing the wheel, it would also be easier to ensure it's on by default for all static resources and thus get benefit across the board.

Perhaps it could be done in a flexible, extensible way as well. Create a limited language (no loops or dangerous stuff) to express policy, search order, etc. And design it so the client side doesn't necessarily have carte blanche and the server side can maintain some control if necessary.

tyingq · 7 years ago
Internal browser support for local caching based on a hash versus "where it came from" would be helpful as well.
degenerate · 7 years ago
Yes that would be great. Imagine a git-like web, where browsers could fetch the difference in chunks of a cached file when there's changes on the server side.
dougb · 7 years ago
Doesn't work for streaming Live Events (pay per view), which is the main use case for multi-CDN
robocat · 7 years ago
The problem is that that leaks information about your viewing habits on one site, to another.
gstaro · 7 years ago
Can't you do that with DNS records where there are multiple IPs on a A record?

Basically, we're already doing this for fault tolerance and load balancing within a single CDN. Except that currently we randomize the IPs. To enforce priorities, you'd want the IPs in the A record at least partially ordered by provider.

toast0 · 7 years ago
Multiple IPs on an A record works to some extent, most (many?) browsers will silently retry another ip from the list if some of the IPs don't accept a connection; I don't know if they'll try another IP on timeout though.

But you can't actually expect any ordering to make it through to the client. Your authoritative server may reorder the records, their recursive server may reorder the records, and the client resolution library may also reorder the records. There's actually an RFC advocating reordering records in client libraries; it's fairly misguided, but it exists in the wild. Reordering is also likely to happen in OS dns caches where those are used.

nostrebored · 7 years ago
That's not quite what an anycast to those addresses does. It's more like an approximation of the nearest server.
masklinn · 7 years ago
They do for some limited items e.g. <object> does nested fallback.

That's not sufficient for something like CDN selection though, you want a fallback in case of failure but you first want to select based on various criteria.

Heag3aec · 7 years ago
Combine with SRI and some convention to just ask one of several hosts for it based on the hash(es) and we have concent-addressable loading.
toomuchtodo · 7 years ago
So IPFS?
thinelvis · 7 years ago
I'm still waiting for a browser to figure out I mean "com" when I typed "cim," and you're thinking CDN retries would work?
moron4hire · 7 years ago
Are you thinking of something like BitTorrent? Why ask for the whole file from a list of hosts when you could ask for any bit of the file they might have?
rahimnathwani · 7 years ago
Then people would create browser plugins or greasemonkey scripts to always optimise for things the viewer cares about (time to start, likelihood of switching providers mid-stream, likelihood of getting full resolution for the longest subset of the video, ...) and disregard the prioritisation set by the provider (which might care about costs, which depend on contracted minimums, overage tariffs etc.).

Then providers would need to combat this by dropping the most expensive CDNs, causing a race to the bottom in which everyone loses: users have worse streaming experience, providers lose customers, good CDNs make less money, margins for bad CDNs are squeezed.

littlestymaar · 7 years ago
The people who would install such kind of add-on is so small it would have like zero effect on the provider's costs …
dougb · 7 years ago
This sounds similar to https://www.conviva.com/precision/

Unfortunately you need to know a lot more and the devil is in the details. Supporting the various streaming devices/browsers is a huge pain in the ass.

Full Disclosure: I worked for both Conviva, and Akamai.

eldod · 7 years ago
Nikolay from Streamroot.io here, co-author of the article.

Yes Conviva provides a service that can give you information about the QoS for the CDN by aggregating data from their customers (they provide a video analytics solution), but it doesn't make the switching (nor on the server side or on the client side), so the video player would need to implement its own logic themselves.

The solution from Streamroot can use this kind of APIs like Conviva Precision, or the one from its competitors like Youbora and Cedexis, and the real value it adds is the client-side switching capability to the players, so it's quite complementary to those solutions.

And indeed the devil is in the details, that's why we built this client-side SDK so the customers don't have to implement all the logic themselves on each platform and device. It was easier for us as we already have SDKs and plugins for most players for our P2P hybrid delivery solution.

fasteddy760 · 7 years ago
There are several factors to consider in a multi-CDN delivery solution.

First, is it VoD or Live? HLS (and DASH) have a second URL option (base URL in DASH), for the client to determine when to choose that Fallback URL. If playback falls back to the second URL, that fallback experience to the viewer, could have had some buffering, or bitrate downshifts triggering that player decision.

Although stream playback recovers/continues, the user experience could have and likely was impacted. Here a second CDN in the multi-CDN deployment was accessed by the client. There is no intelligence here, in the provider selection. Typically the (perceived) most reliable CDN gets that first spot, and the backup CDN gets the Fallback position (second URL) in the manifest/MPD.

In Live, you have the opportunity to provide intelligent CDN selection on every manifest/MPD refresh. If your multi-CDN selection layer has intelligence, access to performant metrics, in real time, that manifest can now point (directed) to the alternate CDN. This requires a level of manifest management on the session level, so that the m3u8 retains the proper historical CDN selection so as not to break playback for that session (in most if not all cases).

There are client solutions, DNS solutions, and cloud solutions that are neither client (sdks), or DNS based. You get to decide how you want integration to be managed and how much work your team can/can't invest in your solutions ongoing level of effort.

Why is most important to consider is the viewer experience, and how playback can best be delivered to avoid buffering, downshifts, the things that cause a viewer to abandon your content and possibly not come back.

If a CDN is performant, and N+1 users are now beginning to watch a stream on that providers network, capacity could be (often is) an issue. Continuing to send users to that CDN may be a sub-optimal experience. Metrics measuring playback determine that bitrates are dropping, buffering increasing, and serve new requests with an alternate CDN providing a better playback experience.

Video is a tightly controlled series of events. We work with chunks of 10s, 6s, 2s, for large buffers, and fast start times. Continually trying to balance the benefits of both.

With an SDK client based solution, you have engineering effort to keep up with OS/hardware updates, testing new code in SDKs, and then pushing out across several platforms, players, etc. Can be daunting.

With DNS, you have TTLs to manage, while lower is better, faster for that next user, there is no mid-stream switching with intelligence once the client is pulling manifests from a specific provider.

With a cloud based solution, each individual stream/user/device is measured and Can selection performed in real time for Live, and for _each_ request on VoD.

Disclaimer, I work at DLVR, and formerly Cedexis. = ]

Jakob · 7 years ago
For VoD I like the approach where you use a fast and reliable CDN for the first seconds and in the background buffer the rest of the video from a cheap location/CDN.

This works if you download video faster than real time which is almost always the case. That way you get the best of both worlds.

fasteddy760 · 7 years ago
That's clever!

What do metrics show for UX for that workflow? (Bonus: What tool for capturing play data?)

mahesh_rm · 7 years ago
Using this comment thread to plug a question: Is there a way to use Cloudflare (or other DNS server providers) in order to dynamically fallback to CDN A (e.g. Cloudfront) in case CDN B (e.g. Netlify CDN) is down?
edaemon · 7 years ago
I don't think Cloudflare Load Balancing can do this (yet), but Dyn can: https://dyn.com/active-failover/

The crucial part is CNAME compatibility. Most DNS services I've had experience with can only do failover between IPs.

mahesh_rm · 7 years ago
Thank you, will try to implement this with Dyn
jniedrauer · 7 years ago
You can control DNS failover with custom health checks in AWS Route 53. You can also do latency based routing. I can't speak for Cloudflare, but I imagine they have similar capability.
mahesh_rm · 7 years ago
Thank you. I overall prefer CF over R53, will try to check out if this can be achieved with either workers or page rules.

Deleted Comment

zackb · 7 years ago
I work for a startup called DLVR (Deliver) and we do this for video streams. HLS, DASH, MSS http://www.dlvr.com
fasteddy760 · 7 years ago
Pretty sure Netlify's CDN is Akamai. So top tier for sure.
nik736 · 7 years ago
off topic: the cookie banner is completely hiding the navbar/logo/navigation.
eldod · 7 years ago
Thanks for noticing! We'll make sure to improve this
yeukhon · 7 years ago
well the most annoying part is the navbar stays... worst UX.

Dead Comment

zzzcpan · 7 years ago
I guess they only care about video. But for websites multi-CDN means essentially building your own CDN where using other CDNs isn't even a good idea, since they don't provide enough granularity of control to monitor and choose nodes and therefore limit you in what you can achieve in terms of latency and availability. DNS is also your biggest and often the only friend here, learning and deploying it yourself is critical, don't rely on any vendor to do it.