Using HTTP meta-headers is actually something we seem to have forgotten how to do.
The one that annoys me most is the accept-language header which is almost entirely ignored in favour of GeoIP lookups to figure out regionality... which I find super odd; as if people are walking around using a browser in a language they don't speak. (or, an operating system configured for a language they don't speak).
ETAG's though, are a bit fraught- if you're a company, a security scan will fire if an etag is detected because you might be able to figure out the inode on the filesystem based on it... which, idk why that's a security problem eitherway[0], but it's common for there to be false-positives[1]... which makes people not respect the header.
Last-Modified should work though, I love the idea of checking headers and not content.
I think people don't care to imagine the computer doing as little as possible to get the job done, and instead use the near unlimited computing power to just avoid thinking about consequences.
To add to the language problem, when I travelled to Europe, some websites (like YouTube) changed to whatever regional language based on where I was, despite me being logged in and Google knowing full well which languages I speak. Even the ads changed language, as if advertising in a language I don't speak will help anyone
almost all of my spam is in french, which is an assumption on the part of the spammers based on the email username. almost all my gmail is spam, because i have directed most real email elsewhere. therefore, almost all the mail i receive at gmail is in french. this has lead to google blocking things (like voter registration confirmation!) that are in english because they're "not in your normal language."
IIUC, accept-language is mostly ignored because the tooling to configure it on the user agent is really poor for most user agents. So users log into a site, they get the site in the wrong language, and because only the site is visible they blame the site, not their UA.
It's the "Your site's broken if IE won't load it" problem.
Can someone attest that this is actually the issue?
FWIW Outlook does accept the "Accept-Language" header and I don't think anyone is saying that outlook is wrong for doing that or claiming it to be broken?
Are you totally sure that this isn't a backwards myth?
I think the most likely situation is that locale information for English speaking countries would be incorrect if the default (en_US) was used to install the operating system, which happens on occasion.
Growing up in Belgium I feel your pain about GeoIPs and accept-language.
I lived in Flanders, with my accept-language set to en-US, en.
Ads would pop up in Dutch, Flemish, French and sometimes German. When you think about it, from a brick-and-mortar point of view, it makes sense. I'm more likely to buy <physical product advertised> at the <local chain grocery store> vs buying it anywhere in the USA, based on my IP.
Next to that, imagine you browsing Reuters.com in with a Berlin IP and accept-language set to en-US, en.
What SHOULD they show you? Local news in German, auto translated? Local news in German? Or redirect you to the US page?
Locality is different from language. In your example, it would have to show you the local German news, as that's local to you, and it would have to show it to you in the first supported language in your accept-language header.
Personally I would prefer, for example, Reuters.com to be a "hub", and all the regional variants on de.reuters.com. Then just let the user choose what they want.
Even when etag's have nothing to do with the filesystem they can still be a security vector. Some API's use etag's to identify what has changed since the last time you called a particular API. This means the ETAG values are probably stored in a database, which means the API server needs to protect against SQL injection in the request headers.
>as if people are walking around using a browser in a language they don't speak. (or, an operating system configured for a language they don't speak
Well, yes, they are! Computers translated in my native language sound dumb. That's how a whole generation of my world learned better English than native speakers, ffs!
Half of the time it's just translated wrong. You think anyone has any incentive to translate any technology to a language with a couple million speakers, all of whom are obligate pirates?
And it seems like you might be surprised to hear that people speak more than one language. Then where's my global setting to tell the browser what languages I speak, so it'd know what header to send? Same place that lets me configure what ads I'm actually interested in. Nowhere.
>I think people don't care to imagine the computer doing as little as possible to get the job done, and instead use the near unlimited computing power to just avoid thinking about consequences.
This, friend, is what computers are for in the XXI century. "Bicycle for the mind", ha...
Seems like a reasonable case for disregarding the client preference. If you're able to speak TLS then you're able to load up a public domain (de)compression library.
I always appreciate Rachel's writings. I don't know much about her, but my takeaway is that she has worked at some of the hardest sysadmin jobs in the past few decades and writes to her experience super well.
Especially as the cost to serve this content approaches zero.
I find the take in the blog to be relatively hostile. It's a "technically correct" rant. Not wrong, but mostly missing the point, and being a bit of a dick in the process.
Sure - block the readers that make a request every 10 seconds. It's perfectly reasonable to block clients if they hit a limit like 20 to 50 requests in a day.
It's damn hostile to block for 24 hours after a single request. If the 10MB of traffic for 20 requests is going to break the bank... maybe don't host an atom or RSS feed at all?
---
That said - weirdos can weird on their own sites as they like. It's not a public service.
But I bucket this into the same category of weird as posting a whole bunch of threatening "no trespassing", "beware of dog", "homeowner is armed", "Solicitors not welcome", etc style signs all over their property.
Like - point out on the doll where the rss client hurt you. Because something's up.
Rachel makes an excellent point here about feed change frequency.
Seems like it'd be straightforward to implement a backoff strategy based on how frequently the feed content changed into most readers. For a regular, periodic fetch, if the content has proven it doesn't update frequently, just back off the period for that endpoint.
If-Modified-Since and ETag are nice and everyone should implement them but IME the implementation status is much better on the reader side than on the feed side. Trim your (main) feed to only recent posts and use Atom's paginatio to link to the rest for new subscribers and the difference in data transferred becomes much smaller.
> Besides that, a well-behaved feed will have the same content as what you will get on the actual web site. The HTML might be slightly different to account for any number of failings in stupid feed readers in order to save the people using those programs from themselves, but the actual content should be the same. Given that, there's an important thing to take away from this: there is no reason to request every single $(&^$(&^@#* post that's mentioned in the feed.
> If you pull the feed, don't pull the posts. If you pull the posts, don't pull the feed. If you pull both, you're missing the whole point of having an aggregated feed!
Unfortunately there are too many feeds that don't include the full content for this to work. And a reader won't know if the feed has the full content before fetching the HTML page. This can also change from post to post so it can't just determine this when subscribing.
> Then there are the user-agents who lie about who they are or where they are coming from because they think it's going to get them special treatment somehow.
These exist because of misbehaved web servers that block based on user agen't or send different content. And since you are complaining about faked user agents that probably includes you.
> Sending referrers which make no sense is just bad manners.
HTTP Referer should not exist. And has been abused by spammers for ages.
> These exist because of misbehaved web servers that block based on user agen't or send different content. And since you are complaining aber faked user agents that probably includes you.
That's a niche. It's about 1 million percent more likely a fake request is coming from an overzealous AI scraper nowadays. I have blocked hundreds of them and I'm on the verge of giving up and handing over money to Cloudflare just for their AI scraping protection.
> If you pull the feed, don't pull the posts. If you pull the posts, don't pull the feed. If you pull both, you're missing the whole point of having an aggregated feed!
People probably do this because some sites only give you a preview in the feed, to force you to go to the site and view the ads.
So if you want the full post in the feed reader, you need to pull the post as well.
This. My feed reader pulls a "reader" view so I don't have to leave the app. I normally wouldn't mind going to the website, except that to do so would mean waiting for it to fully load, dealing with javascript popups, and often bad scrolljacking.
The one that annoys me most is the accept-language header which is almost entirely ignored in favour of GeoIP lookups to figure out regionality... which I find super odd; as if people are walking around using a browser in a language they don't speak. (or, an operating system configured for a language they don't speak).
ETAG's though, are a bit fraught- if you're a company, a security scan will fire if an etag is detected because you might be able to figure out the inode on the filesystem based on it... which, idk why that's a security problem eitherway[0], but it's common for there to be false-positives[1]... which makes people not respect the header.
Last-Modified should work though, I love the idea of checking headers and not content.
I think people don't care to imagine the computer doing as little as possible to get the job done, and instead use the near unlimited computing power to just avoid thinking about consequences.
[0]: https://www.pentestpartners.com/security-blog/vulnerabilitie...
[1]: https://github.com/sullo/nikto/issues/469
It's the "Your site's broken if IE won't load it" problem.
FWIW Outlook does accept the "Accept-Language" header and I don't think anyone is saying that outlook is wrong for doing that or claiming it to be broken?
Are you totally sure that this isn't a backwards myth?
I think the most likely situation is that locale information for English speaking countries would be incorrect if the default (en_US) was used to install the operating system, which happens on occasion.
I lived in Flanders, with my accept-language set to en-US, en.
Ads would pop up in Dutch, Flemish, French and sometimes German. When you think about it, from a brick-and-mortar point of view, it makes sense. I'm more likely to buy <physical product advertised> at the <local chain grocery store> vs buying it anywhere in the USA, based on my IP.
Next to that, imagine you browsing Reuters.com in with a Berlin IP and accept-language set to en-US, en.
What SHOULD they show you? Local news in German, auto translated? Local news in German? Or redirect you to the US page?
Personally I would prefer, for example, Reuters.com to be a "hub", and all the regional variants on de.reuters.com. Then just let the user choose what they want.
Well, yes, they are! Computers translated in my native language sound dumb. That's how a whole generation of my world learned better English than native speakers, ffs!
Half of the time it's just translated wrong. You think anyone has any incentive to translate any technology to a language with a couple million speakers, all of whom are obligate pirates?
And it seems like you might be surprised to hear that people speak more than one language. Then where's my global setting to tell the browser what languages I speak, so it'd know what header to send? Same place that lets me configure what ads I'm actually interested in. Nowhere.
>I think people don't care to imagine the computer doing as little as possible to get the job done, and instead use the near unlimited computing power to just avoid thinking about consequences.
This, friend, is what computers are for in the XXI century. "Bicycle for the mind", ha...
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Ac...
In Chrome: chrome://settings/languages
In Firefox: https://support.mozilla.org/en-US/kb/choose-display-language...
Firefox: https://support.mozilla.org/en-US/kb/choose-display-language...
Chrome insists that the first language be the UI language, and Safari insists that the first language be the _system_ language.
Look again. Or switch browser. It is a basic feature and the issue is indeed websites ignoring it.
[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Ac...
https://rachelbythebay.com/w/atom.xml
Especially as the cost to serve this content approaches zero.
I find the take in the blog to be relatively hostile. It's a "technically correct" rant. Not wrong, but mostly missing the point, and being a bit of a dick in the process.
Sure - block the readers that make a request every 10 seconds. It's perfectly reasonable to block clients if they hit a limit like 20 to 50 requests in a day.
It's damn hostile to block for 24 hours after a single request. If the 10MB of traffic for 20 requests is going to break the bank... maybe don't host an atom or RSS feed at all?
---
That said - weirdos can weird on their own sites as they like. It's not a public service.
But I bucket this into the same category of weird as posting a whole bunch of threatening "no trespassing", "beware of dog", "homeowner is armed", "Solicitors not welcome", etc style signs all over their property.
Like - point out on the doll where the rss client hurt you. Because something's up.
Seems like it'd be straightforward to implement a backoff strategy based on how frequently the feed content changed into most readers. For a regular, periodic fetch, if the content has proven it doesn't update frequently, just back off the period for that endpoint.
> Besides that, a well-behaved feed will have the same content as what you will get on the actual web site. The HTML might be slightly different to account for any number of failings in stupid feed readers in order to save the people using those programs from themselves, but the actual content should be the same. Given that, there's an important thing to take away from this: there is no reason to request every single $(&^$(&^@#* post that's mentioned in the feed.
> If you pull the feed, don't pull the posts. If you pull the posts, don't pull the feed. If you pull both, you're missing the whole point of having an aggregated feed!
Unfortunately there are too many feeds that don't include the full content for this to work. And a reader won't know if the feed has the full content before fetching the HTML page. This can also change from post to post so it can't just determine this when subscribing.
> Then there are the user-agents who lie about who they are or where they are coming from because they think it's going to get them special treatment somehow.
These exist because of misbehaved web servers that block based on user agen't or send different content. And since you are complaining about faked user agents that probably includes you.
> Sending referrers which make no sense is just bad manners.
HTTP Referer should not exist. And has been abused by spammers for ages.
That's a niche. It's about 1 million percent more likely a fake request is coming from an overzealous AI scraper nowadays. I have blocked hundreds of them and I'm on the verge of giving up and handing over money to Cloudflare just for their AI scraping protection.
People probably do this because some sites only give you a preview in the feed, to force you to go to the site and view the ads.
So if you want the full post in the feed reader, you need to pull the post as well.
This person isn't thinking as a user.