Have you managed to verify this or you're just assuming it ? I'm just assuming it, that's why I'm asking. Like have you made a request for your data from Facebook / data brokers and it looks straight ? I trust Mozilla to the fullest and have made no effort to investigate.
Facebook managed to get some off-Facebook activity from me even using this. The site in question was also loaded in a Private Browsing window and Facebook claimed it was from pixel tracking. I'm guessing they've inferred it based on IP, especially as I live by myself.
How this is legal under GDPR, given I'm a UK citizen, I'm really not sure.
I do this, but to every webpage - so there should be much less cross-site talk. Not that the advertising machine doesn't have a million other ways of getting though.
Use both methods instead of just one. They differ in nature, and can be implemented at different perimeters of your network. Maybe there exists certain chokeholds in the network where multiple devices can be protected in one go?
Personally, I would have pure IP blackhole routing performed in the router providing WAN access to internal networks. A blanket protection for all desktops and 802.11 devices inside.
Many devices today are locked-down and editing hosts records can be untrivial. Instead of relying on 0.0.0.0 routing through hosts, the same effect can be obtained by setting up a personal DNS server e.g. bind9 with RPZ's listing the targeted domains[1].
Why all that hassle? Because an unrooted smartphone with a Wireguard link to the DNS server (or full-on VPN using that DNS server), can have lookups made through the server you control. And that DNS service is available to use on any local network/Wi-Fi one has to use. IIRC 3G/4G/5G WAN routes were harder to get right, but I think it was possible. One could always route all traffic through a purposeful VPN.
Defense in depth.
---
[1]: fb.rpz.zone:
;RPZ
$TTL 10
@ IN SOA rpz.zone. rpz.zone. (
37;
3600;
300;
86400;
60 )
IN NS localhost.
.facebook.com IN A 0.0.0.0
.facebook.net IN A 0.0.0.0
.fbcdn.com IN A 0.0.0.0
.fbsbx.com IN A 0.0.0.0
.fbcdn.net IN A 0.0.0.0
.edgesuite.net IN A 0.0.0.0
AS32934 is the account maintaining Facebook's public presence(s)? How'd you figure that out?
Using routing table mainteners to DNS entries seems like a terrific why to create those ad-blocking lists. Is this how it's done? I always assumed those lists are manually collated and curated.
Version 4 of The Internet (the popular one that blew up in the 1990s) is made up of 32 bit numeric addresses attached to physical things to which data has to be routed.
In the 80s these would be bunched up by org in nice ways just like how phone numbers that were all in the same place would share an area code. MIT would be 1.1.x.y and you’d route their data to Cambridge MA. IBM would be 2.x.y.z and you’d route to them and let them deal with it internally. Some
small outfit in France might’ve gotten 173.4.5.q: you’d send their data into the Atlantic fibre because “173.something” meant “Europe” and let the other end figure it out.
In the 90s it all got messy because 32bits wasn’t enough to keep things in a clean hierarchy that reflected how data was routed around the net. Orgs ended up accumulating fragments of IP address space from all over the place for the hosts at their physical site. The hierarchy of the address couldn’t tell you how to route traffic and the rules for routing became highly extensive and dynamic.
Enter Autonomous Systems Numbers and BGP. It’s a layer on top of IP addressing that only matters to internet core routers with many choices as to how to your traffic (“multi homed” sites). It helps map IP addresses to actual places — internet peers, aka fellow ISPs — so they can agree with each other how traffic should be routed. BGP lets peers keep these routes updated and let’s you know who owns what.
None of this matters if you have a single internet connection. Routing is easy: it’s either “local” or you send it to your ISP. But if you’re an ISP in the centre how do you know who gets what? You use The [Internet] Routing Table as maintained by the BGP system.
Some companies have so much traffic they have their own ASN. Because the internet is open, you get to see all the IP addresses which are bundled up inside that ASN, which is what I was linking to. It only works because FB is its own self-serving ISP with its own ASN.
(With IPv6 this should all have gone away not because of the number of addresses, but because the address space was 128-bits wide. You could hierarchically route 256 towns in 256 counties in 256 states in 256 countries and still only have used half the hierarchy. ISPs usually get a /32 of this but Facebook announce a bunch of /48s which I don’t understand.)
Unfortunately this won't do much anymore, as Facebook and others are transitioning to server-side data transmission. Businesses now log data onto their own servers, then transmit it directly to adtech companies so that your device never directly touches the adtech server.
This is also why companies like Tealium and Segment are now worth billions of dollars. They provide a single integration point that funnels events to dozens of marketing companies' server-side APIs.
This isn’t publishers displaying ads and reporting how many views they get, this is to associate visitors with ads seen on other surfaces (Facebook, Google) for retargeting (show future ads to people who visited your landing page) or measurement purposes (for people who saw ad A, how many people eventually made a purchase?)
For logged in users, it's trivial to match users across sites with an email address or a phone number.
If you're clicking between sites, there may be a unique ID appended to the outbound URL (on Google there's a gclid URL parameter). This ID will be logged on the destination site and can be continuously passed around to identify the same user on multiple sites.
If they don't need perfect matching, they'll use IP addresses, user agents, and other fingerprinting techniques for fuzzy matches.
There are different responsibilities for the "controller" and the "processor" under GDPR.
Facebook and Google are recognized as processors in this situation. The websites that send them the data are the controllers and are subject to the vast majority of the regulation, while the processors can assume that the controller has obtained user consent until informed otherwise.
It's legally important to recognize that Facebook and Google are not blindly sucking up data from around the internet. Websites/apps are actively transmitting this data to them and other adtech platforms for their own benefits.
AdGuard can run a local VPN that intercepts HTTPS traffic and blocks ads even within HTTPS traffic. It's a little sketchy since they man-in-the-middle your encrypted traffic in order to do this, but they exclude extended validation certs (the ones where the name shows up next to the lock) and over 1,300 other exceptions (https://kb.adguard.com/en/general/https-filtering). That should be able to block a lot more, including ads via apps.
This can do a lot more than a normal VPN or DNS blocker because it's actually intercepting and decrypting HTTPS traffic (rather than just passing it through).
However, Facebook has been very good at making ads that are hard to block, even if you have access to everything. They've been pretty aggressive about getting around things like uBlock Origin even on desktop browsers.
DNS-based blocking also likely wouldn't have much impact on a company that could serve ad content and regular content off the same domain names - or that could just rotate domain names too much.
Also, AdGuard's local-VPN/HTTPS-intercepting feature is a pay-for feature (I believe $5/year or a $10 one-time charge).
Instagram serves ads as part of the feed API response. If you use the app, there's no way to remove them without patching the app. I did do that for Android, but on iOS it's impossible without jailbreak.
Within Safari there are Content Blockers. This is a cool iOS concept because unlike a traditional browser extension, all it can do is tell Safari what to block, it doesn't get to see your activity itself.
I use Firefox Focus on iOS 11 for Facebook. It worked OK until a few days ago. Now, I cannot see my messages unless I request for a desktop version of the website. Even when that option is turned on, I noticed that the site kicks me out after a couple of minutes, 10 min or so.
GitHub has their own ASN though (all DNS records I tried resolving pointed to this AS), and you could just not block api.github.com or raw.githubusercontent.com.
Facebook Container
https://addons.mozilla.org/en-US/firefox/addon/facebook-cont...
How this is legal under GDPR, given I'm a UK citizen, I'm really not sure.
https://addons.mozilla.org/en-US/firefox/addon/temporary-con...
...which includes the downtown Palo Alto address, hah. It’s linked from facebook.com/peering/
Here’s a list of the IP prefixes:
https://bgp.he.net/AS32934#_prefixes
https://bgp.he.net/AS32934#_prefixes6
https://github.com/smigniot/smigniot.github.io/blob/master/i...
For the google part I had to recurse through the AS list first, and perform cidr merging
https://whois.arin.net/rest/net/NET-63-150-141-224-1.html
is Facebook's range not included above.
Use both methods instead of just one. They differ in nature, and can be implemented at different perimeters of your network. Maybe there exists certain chokeholds in the network where multiple devices can be protected in one go?
Personally, I would have pure IP blackhole routing performed in the router providing WAN access to internal networks. A blanket protection for all desktops and 802.11 devices inside.
Many devices today are locked-down and editing hosts records can be untrivial. Instead of relying on 0.0.0.0 routing through hosts, the same effect can be obtained by setting up a personal DNS server e.g. bind9 with RPZ's listing the targeted domains[1].
Why all that hassle? Because an unrooted smartphone with a Wireguard link to the DNS server (or full-on VPN using that DNS server), can have lookups made through the server you control. And that DNS service is available to use on any local network/Wi-Fi one has to use. IIRC 3G/4G/5G WAN routes were harder to get right, but I think it was possible. One could always route all traffic through a purposeful VPN.
Defense in depth.
---
[1]: fb.rpz.zone:
;RPZ $TTL 10 @ IN SOA rpz.zone. rpz.zone. ( 37; 3600; 300; 86400; 60 ) IN NS localhost.
.facebook.com IN A 0.0.0.0 .facebook.net IN A 0.0.0.0 .fbcdn.com IN A 0.0.0.0 .fbsbx.com IN A 0.0.0.0 .fbcdn.net IN A 0.0.0.0 .edgesuite.net IN A 0.0.0.0
Those RADb responses include this line:
AS32934 is the account maintaining Facebook's public presence(s)? How'd you figure that out?Using routing table mainteners to DNS entries seems like a terrific why to create those ad-blocking lists. Is this how it's done? I always assumed those lists are manually collated and curated.
In the 80s these would be bunched up by org in nice ways just like how phone numbers that were all in the same place would share an area code. MIT would be 1.1.x.y and you’d route their data to Cambridge MA. IBM would be 2.x.y.z and you’d route to them and let them deal with it internally. Some small outfit in France might’ve gotten 173.4.5.q: you’d send their data into the Atlantic fibre because “173.something” meant “Europe” and let the other end figure it out.
In the 90s it all got messy because 32bits wasn’t enough to keep things in a clean hierarchy that reflected how data was routed around the net. Orgs ended up accumulating fragments of IP address space from all over the place for the hosts at their physical site. The hierarchy of the address couldn’t tell you how to route traffic and the rules for routing became highly extensive and dynamic.
Enter Autonomous Systems Numbers and BGP. It’s a layer on top of IP addressing that only matters to internet core routers with many choices as to how to your traffic (“multi homed” sites). It helps map IP addresses to actual places — internet peers, aka fellow ISPs — so they can agree with each other how traffic should be routed. BGP lets peers keep these routes updated and let’s you know who owns what.
None of this matters if you have a single internet connection. Routing is easy: it’s either “local” or you send it to your ISP. But if you’re an ISP in the centre how do you know who gets what? You use The [Internet] Routing Table as maintained by the BGP system.
Some companies have so much traffic they have their own ASN. Because the internet is open, you get to see all the IP addresses which are bundled up inside that ASN, which is what I was linking to. It only works because FB is its own self-serving ISP with its own ASN.
(With IPv6 this should all have gone away not because of the number of addresses, but because the address space was 128-bits wide. You could hierarchically route 256 towns in 256 counties in 256 states in 256 countries and still only have used half the hierarchy. ISPs usually get a /32 of this but Facebook announce a bunch of /48s which I don’t understand.)
This is also why companies like Tealium and Segment are now worth billions of dollars. They provide a single integration point that funnels events to dozens of marketing companies' server-side APIs.
For logged in users, it's trivial to match users across sites with an email address or a phone number.
If you're clicking between sites, there may be a unique ID appended to the outbound URL (on Google there's a gclid URL parameter). This ID will be logged on the destination site and can be continuously passed around to identify the same user on multiple sites.
If they don't need perfect matching, they'll use IP addresses, user agents, and other fingerprinting techniques for fuzzy matches.
Facebook and Google are recognized as processors in this situation. The websites that send them the data are the controllers and are subject to the vast majority of the regulation, while the processors can assume that the controller has obtained user consent until informed otherwise.
It's legally important to recognize that Facebook and Google are not blindly sucking up data from around the internet. Websites/apps are actively transmitting this data to them and other adtech platforms for their own benefits.
This can do a lot more than a normal VPN or DNS blocker because it's actually intercepting and decrypting HTTPS traffic (rather than just passing it through).
However, Facebook has been very good at making ads that are hard to block, even if you have access to everything. They've been pretty aggressive about getting around things like uBlock Origin even on desktop browsers.
DNS-based blocking also likely wouldn't have much impact on a company that could serve ad content and regular content off the same domain names - or that could just rotate domain names too much.
Also, AdGuard's local-VPN/HTTPS-intercepting feature is a pay-for feature (I believe $5/year or a $10 one-time charge).
Deleted Comment
I use this app which, among other things, lets you hook up remote sources as block lists: https://apps.apple.com/us/app/adblock/id691121579
This means you can for example hook it up directly to this repository (which I've done) and automatically get updated lists
https://bgp.he.net/AS54113
- https://news.ycombinator.com/item?id=11791052 (2016)
- https://news.ycombinator.com/item?id=16632677 (2018)
Go to your rules window, in the bottom left there is plus and by clicking you reveal the option to add "rule group subscriptions" https://help.obdev.at/littlesnitch4/lsc-rule-group-subscript...