> There are always practical limitations to site-wide technical changes, and HTTPS Everywhere is no different. Sites and content we consider ‘archival’ that involve no signing in or personalisation, such as the News Online archive on news.bbc.co.uk, will remain HTTP-only. This is due to the cost we’d incur processing tens of millions of old files to rewrite internal links to HTTPS when balanced against the benefit.
Not to be snarky, but haven't people written tools to help with this? This seems like a common issue. I mean, there's `sed` and similar tools, obviously, but something that could go, validate that the link works over https://, and update it. I don't see why that would need to be some monumental amount of work.
Let's say you have a web page with a javascript slippy map that imports openlayers from a CDN; and openlayers then retrieves map tiles from openstreetmap.
If you serve that page over https but the javascript CDN url is http, the javascript library won't load. And if the js CDN supports https and you switch to it, the library might still compose a http URL to retrieve the map tiles - causing some browsers to block the tiles as mixed content. Other browsers are willing to load http images on https pages and will work. Unless the tool understands how the map library composes its URLs, someone will have to fix this manually.
To detect bugs like that automatically, after changing to https you'd have to spider every page in your site with several different browsers / browser configurations looking for errors and bad links. And if your archived site had a bunch of errors and bad links to start with, you'll need some way to compare the before-and-after error reports too.
They could put in place redirects, and then use HSTS to tell browsers to only visit the HTTPS links.
They could leave the old HTML unprocessed and pointing at HTTP and HSTS will fix it for modern browsers.
Only the first request would be via HTTP, and Chrome and other browsers can be told to use HTTPS when they see the links even then: https://hstspreload.appspot.com/
> Not to be snarky, but haven't people written tools to help with this? This seems like a common issue. I mean, there's `sed` and similar tools, obviously, but something that could go, validate that the link works over https://, and update it. I don't see why that would need to be some monumental amount of work.
Not as trivial as you'd think: if there's an HTTP URL on the page when it should be HTTPS, how did the URL end up there? Dynamically from PHP code? Dynamically from JavaScript code? Did the URL come from a database? Did the URL come from an environment variable? It can be a lot of work to track all these down and a lot of them you won't be able to find using grep/sed e.g. URLs might appear as relative URLs in code with the "http" part being added dynamically.
You'll get insecure content warnings as well if you try to load HTTP images, css, iframes or JavaScript on an HTTPS page. Likewise, the URL for these can come from lots of places.
I think this trivializes the scope of what the BBC developed. Even with well automated processes, you'd still want a human doing light QA given the wide diversity of content. The BBC has been at it for over twenty years building ad hoc minisites[1]--sites so far down the long tail, that if forced to choose, they may be more prone to pull the plug than to maintain.
Sites and content we consider ‘archival’ that involve no signing in or personalisation,
AUGH! Seeing this "SSL is just for private things" mindset in 2016 is really disheartening. It's to keep people from screwing with your connection, not just snooping on it.
I really hope the browser vendors start treating HTTP the same way they treat broken certs sometime soon. This will change once users start asking, en masse, "Why am I getting all these warnings", not before.
I don't think you can just run 'sed' on any random iOS app, any random symbian app, any random smart-TV app, some other guy's service that hits your APIs and feeds, and so on... :)
Now, that could be a valid issue, indeed, though not sure for how long I care about those devices continuing to work without any valid upgrade path... Using things like HSTS and CSP's `upgrade-insecure-requests` would help here for clients that do support it.
> Earlier in 2016, the Chromium development team decided to implement a change to Google Chrome, preventing access to certain in-browser features on ‘insecure’ (non-HTTPS) web pages. In practice, this meant that key features of certain products, such as the location-finding feature within the Homepage, Travel News and Weather sites, would stop working if we didn’t enable HTTPS for those services.
I think this shows how valuable it is to use incentives to get people to Do The Right Thing(tm). Perhaps more things should be changed to require HTTPS.
It was gutsy (and insightful) of them to publish to the world their upgrade experience. I wish people would be a little more positive about that instead of pointing out how much they suck.
> The CPU overhead of TLS encryption has historically been significant. We’ve done a lot of work behind the scenes to improve both the software and hardware layers to minimise the load impact of TLS whilst also improving security.
I thought that it hasn't been significant overhead for a while now?
Even if it took an ancient machine 10x longer than a 2012 MacBook air, that 61 milliseconds more is really not all that much time in the grand scheme of things.
I'm sure the people using these machines that are "much older and much less powerful" than a 2012 macbook air are not expecting sites to load as fast as a newer machine, and probably don't care about the loss of less than 0.1 seconds to load time. If you're running a 6+ year old machine and expecting high performance you'd have to be insane.
Even if BBCOnline cared this intensely about performance, there are more than a few other things they could do to speed everything up. The switch from Apache to NGINX for one. I know that this takes many more developer/sysadmin hours, but if they really cared about a tens of milliseconds then it is definitely something they'd invest in. NGINX has quite a lot of support and is very stable, as well as generally known to much faster than Apache in most cases [1]. It's also not like NGINX is a hipster/unused server, it has quite a respectable share of the 'market' [2].
I also noticed on this page that they docwrite a script (probably to force it async?). This type of 'hack' is terrible for performance [3]. You could just add the 'async' attribute to the script tag and actually move it in the html and reduce the cycles wasted by a hacky solution.
I can see they've only enabled Elliptic Curve Diffie-Hellman Ephemeral and RSA key exchange cipher suites. That means users will either use ECDHE and get forward secrecy, or old clients will just use "RSA" (which means the client sends a "pre-key" back to the server encrypted with the server's public key) which works but doesn't give you forward secrecy.
What they HAVEN'T enabled is Diffie-Hellman Ephemeral suites, which give older clients forward secrecy at a big CPU hit.
So this is an example of performance-tuning your TLS settings. There's also stuff to do with session tickets, session resumption, and eventually they'd also be served using ECDSA certs, once all clients support it, or there is at least a great way to only show the older RSA cert to old clients.
And just yesterday I told someone to visit BBC when trying to connect to public wifi that requires a redirect to a login page first. Guess I'm going to have to find a new go-to http site now
ON a more serious note, I always use http://example.com. Being reserved and maintained by the IANA for documentation and testing, it's the most stable site I can think of.
Be aware that plenty of ISPs sadly MITM example.com. I ran into this when our test suite that curl'ed example.com and checked its output failed when we ran our binary on a new provider.
There is! RFC 7710[1] specifies a DHCP option/RA extension that encodes the URL of the captive portal, s.t. when, e.g. DHCP completes, the connecting machine immediately knows what the captive portal URL is, and doesn't have to get MitM'd to know it.
Although this is good news, it will stop me from injecting a hidden breaking news banner to stop it popping up. Should still be able to block the domain, but that won't cache for as long when off WiFi. [^1]
At least this will stop ISPs like BT from doing deep packet inspection and serving stale pages from their cache. Once it's been rolled out to the news site over the next year, of course.
If they use ChaCha-Poly then the load on low power devices shouldn't be much. I did a lot of reading on this for my recent book and it's pretty good for devices lacking hardware AES acceleration.
Not to be snarky, but haven't people written tools to help with this? This seems like a common issue. I mean, there's `sed` and similar tools, obviously, but something that could go, validate that the link works over https://, and update it. I don't see why that would need to be some monumental amount of work.
HTTPS is more than just privacy. See https://certsimple.com/blog/ssl-why-do-i-need-it and https://www.troyhunt.com/ssl-is-not-about-encryption/
If you serve that page over https but the javascript CDN url is http, the javascript library won't load. And if the js CDN supports https and you switch to it, the library might still compose a http URL to retrieve the map tiles - causing some browsers to block the tiles as mixed content. Other browsers are willing to load http images on https pages and will work. Unless the tool understands how the map library composes its URLs, someone will have to fix this manually.
To detect bugs like that automatically, after changing to https you'd have to spider every page in your site with several different browsers / browser configurations looking for errors and bad links. And if your archived site had a bunch of errors and bad links to start with, you'll need some way to compare the before-and-after error reports too.
TLDR: It can be more complicated than you think.
They could put in place redirects, and then use HSTS to tell browsers to only visit the HTTPS links.
They could leave the old HTML unprocessed and pointing at HTTP and HSTS will fix it for modern browsers.
Only the first request would be via HTTP, and Chrome and other browsers can be told to use HTTPS when they see the links even then: https://hstspreload.appspot.com/
Not as trivial as you'd think: if there's an HTTP URL on the page when it should be HTTPS, how did the URL end up there? Dynamically from PHP code? Dynamically from JavaScript code? Did the URL come from a database? Did the URL come from an environment variable? It can be a lot of work to track all these down and a lot of them you won't be able to find using grep/sed e.g. URLs might appear as relative URLs in code with the "http" part being added dynamically.
You'll get insecure content warnings as well if you try to load HTTP images, css, iframes or JavaScript on an HTTPS page. Likewise, the URL for these can come from lots of places.
[1] http://news.bbc.co.uk/nol/ukfs_news/hi/uk_politics/vote_2005...
AUGH! Seeing this "SSL is just for private things" mindset in 2016 is really disheartening. It's to keep people from screwing with your connection, not just snooping on it.
I really hope the browser vendors start treating HTTP the same way they treat broken certs sometime soon. This will change once users start asking, en masse, "Why am I getting all these warnings", not before.
Source: http://peter.sh/experiments/chromium-command-line-switches/
See:
I think this shows how valuable it is to use incentives to get people to Do The Right Thing(tm). Perhaps more things should be changed to require HTTPS.
I thought that it hasn't been significant overhead for a while now?
related: https://www.maxcdn.com/blog/ssl-performance-myth/https://istlsfastyet.com/
The BBC has to deal with machines much older and much less powerful than that.
The servers shouldn't be running on old MacBook airs.
I'm sure the people using these machines that are "much older and much less powerful" than a 2012 macbook air are not expecting sites to load as fast as a newer machine, and probably don't care about the loss of less than 0.1 seconds to load time. If you're running a 6+ year old machine and expecting high performance you'd have to be insane.
Even if BBCOnline cared this intensely about performance, there are more than a few other things they could do to speed everything up. The switch from Apache to NGINX for one. I know that this takes many more developer/sysadmin hours, but if they really cared about a tens of milliseconds then it is definitely something they'd invest in. NGINX has quite a lot of support and is very stable, as well as generally known to much faster than Apache in most cases [1]. It's also not like NGINX is a hipster/unused server, it has quite a respectable share of the 'market' [2].
I also noticed on this page that they docwrite a script (probably to force it async?). This type of 'hack' is terrible for performance [3]. You could just add the 'async' attribute to the script tag and actually move it in the html and reduce the cycles wasted by a hacky solution.
[1]: https://www.rootusers.com/web-server-performance-benchmark/ [2]: http://news.netcraft.com/archives/2016/03/18/march-2016-web-... [3]: https://www.stevesouders.com/blog/2012/04/10/dont-docwrite-s...
What they HAVEN'T enabled is Diffie-Hellman Ephemeral suites, which give older clients forward secrecy at a big CPU hit.
So this is an example of performance-tuning your TLS settings. There's also stuff to do with session tickets, session resumption, and eventually they'd also be served using ECDSA certs, once all clients support it, or there is at least a great way to only show the older RSA cert to old clients.
ON a more serious note, I always use http://example.com. Being reserved and maintained by the IANA for documentation and testing, it's the most stable site I can think of.
or use what Google does when Chrome notifies you of a login gateway to public wifi: http://www.gstatic.com/generate_204
[1]: https://tools.ietf.org/html/rfc7710
At least this will stop ISPs like BT from doing deep packet inspection and serving stale pages from their cache. Once it's been rolled out to the news site over the next year, of course.
If they use ChaCha-Poly then the load on low power devices shouldn't be much. I did a lot of reading on this for my recent book and it's pretty good for devices lacking hardware AES acceleration.
[^1]: https://unop.uk/block-bbc-breaking-news-on-all-devices