Hi all. We've been playing a silly little game with my wife lately: we send each other messages about some topic we never talk about and then wait for ads related to our conversation to start showing up in Instagram. As of the last month, they never fail to show up.
Please keep in mind that this is a conversation between two "personal" accounts, no business accounts involved. More so, we haven't accepted the new terms of use that "allowed" WhatsApp to access messages between personal accounts and business accounts.
Is WhatsApp scanning personal messages to target their ads as we are noticing? Weren't WhatsApp messages end to end encrypted? Is this a violation of their Terms of Use or am I missing something silly?
Here is how I think you could design a more robust (but less fun) experiment:
- Come up with a bunch of topics, write them down on slips of paper, put the paper into a hat
- Each Monday, draw three topics from the hat, send some WhatsApp messages about the first, Messenger messages about the second, and don’t discuss the third. Don’t put the topics back in the hat.
- If you see any ads relating to one of the topics, screenshot them and save screenshots to eg your computer with a bit of the topic
- Separately, record which topic went to which platform
- After doing this for a while, go through the screenshots and (each of you and your wife or ideally other people) give a rating for how well the ad matches the topic. To avoid bias, you shouldn’t know which app saw the topic.
- Now work out average ratings / the distribution across the three products (WhatsApp vs Messenger vs none) and compare
- pick said topic, something you never cared about before, talk about it but don't write any messages containing it; - for 1 month record every ad you see about it; - send a message about the topic; - for another month, record every ad you see about it
Comparing the number of occurrences will tell you what is happening.
This does not work. How did you come about the topic? Answer: it was in your brain, because advertising, trends among your peers and social connections, online trends real or astroturfed, etc.
That's why you end up with people thinking their phone is "listening" to them.
-How did you learn about the product?
-Have you ever searched for it?
-Did a friend of yours tell you about it? Do you think they searched for it?
-Are a lot of ads for it playing on TV channels you like? Could instagram know you like those TV channels?
-Is it something your neighbors got? Do you think there has been a spike in shipments of this product to your neighbors?
Eventually people start to “get” that scanning the text of messages is way more helpful for humans than it is for computers. They’ve got other data they can use.
I'm just not convinced of the always on microphones in phones listening for and processing every single thing considering how much battery drain that would cause, whether the processing is done on device or they're sending all that data to a server to be processed.
It's sort of like getting mugged once and then setting up a camera in a bunch of alleys to prove that muggers exist. You can even set up a camera of yourself running into dark alleys every night, but the odds of reproducing a mugging is still extremely low.
There's a certain kind of precision that convinces me it's real though. Precision is common. I look at a book on Amazon, and a FB ad for that book appears.
But I get rejected for a loan via WhatsApp and then used car ads appear for that model of car that I applied a loan for? That's a bit on the nose.
Aside from random coincidence, I could see this happening if you provided your personal information (especially email) for the loan application. It could have been shared to multiple underlying lenders alongside a data vendor who ultimately provided interest targeting (which can include car models) to an ad network.
Getting an ad for that specific model could also have been due to other online activity, such as checking the KBB.
I did that and I got a mugging on camera. The attacker was convicted.
If I had to guess, your whatsapp messages are e2e secured but keywords are sent to facebook when they match some condition. So if you message "happy birthday" to someone, they won't see that but the fact that the keyword "birthday" was found even if the word isn't included is sent to fb. That way they can say they're not snooping your messages.
You need to know if you got 3 topics of ads every day and 1/3 of them are related to that secret topic, OR if you get 300 topics every day and 1/300 are related to that secret topic. If it’s the former, it’s suspicious, if it’s the latter, it’s way less suspicious.
Also, what we’re interested is if the text changed what was shown. If I saw ads for X last week but didn’t notice them, then spoke to a friend and noticed them and took a screenshot, it would appear to confirm the theory. Even though I was always seeing ads for X.
Ultimately, I don’t think people who are convinced of this theory will change their minds so it’s a moot point.
Is it so hard to believe that Meta is snooping on WhatsApp conversations? Meta, a company of unprecedented size that was built over monetizing your private data? A company who's been caught in plenty of scandals (like Cambridge Analytic) about this exact sort of thing (violating their users' privacy)?
Someone from this community, which generally means educated, tech-literate and sensitive to these topics shares a perfectly plausible observation, of something that has been experienced as well by plenty of other folks, me included; and then some people come and try to make up the most convoluted explanations (candy boxes from Kazakhstan just happened to be trending that specific day, nothing to see here, move along!) to this phenomena and try to shift the blame away from Meta. Why do you do this? Are you Meta employees? A PR agency they hired?
It's just baffling. Apparently some people DO want to be abused.
Plot twist: we all get ads about candy boxes from KZ now.
Despite the shady image they have, these companies go to great lengths to avoid doing shady things (because ultimately it’s bad for business). Not to mention the hundreds of tech employees that would have to be involved and keep quiet in this type of “conspiracy”. It’s incredibly unlikely, I truly believe that.
The tech companies you work for do often engage in illegal activities, and some of your collegues are complicit. I'm sure it is an uncomfortable thought for some of you, but this is all part of the public record.
The sad reality is people are very predictable, even with basic data.
I can imagine the same thing done for text. The text might be encrypted, but interest keywords might be generated on-device and sent out-of-band.
I'm not claiming this is real, but I agree with GP.
For an example from non-FAANG companies, see illegal dumping of toxic waste by chemical companies, such as DuPont and PFOAs [1]. Despite knowing what they did was illegal, the math works out -- products with PFOAs were something like $1 billion in annual profit, and even when they got caught the fines and legals were a fraction of that, spread out over many years.
So I personally believe these companies 100% would do shady shit if it increases their profit margins. And why wouldn't they? There is no room for morals in capitalism, and the drawbacks are slim.
[1]https://www.nytimes.com/2016/01/10/magazine/the-lawyer-who-b...
As others above me have thoroughly explained, there are numerous ways Facebook could figure out what you’re reading about/listening to/viewing on the internet, which ultimately drives what you are chatting with your friends about. Reading your messages would actually be the most difficult and low fidelity way for them to try to mine this information. They can just see your entire browsing history and extract from there, since the majority of website have a tracking cookie that in some way phones home to Facebook.
If FB could do that, then FB would realize that these topics are not actually products they are interested in, so they wouldn't be showing ads.
I disagree with this to the extent that I would say the exact opposite is true.
Facebook (and others) have proven time and time again that they cannot correctly predict user behavior by locking out or banning users who actually did nothing wrong (because their algorithms predicted that the user was breaking terms of service or might be planning to). This happens over and over, even in cases not so complex as the "photos of my child to send to my doctor".
But on the flipside, Zuckerberg has been documented saying one thing to the public and exactly the opposite in private. Heck, Facebook has had memos and emails leaked where they talked about how they would say one thing in public (and to regulators) while doing the opposite secretly.
I believe that Facebook cheats and breaks agreements (and laws) in multiple directions all the time, often willfully. They've even been caught cheating their own ad customers by intentionally overstating the effectiveness and target accuracy of their ads.
Deleted Comment
I know that, unfortunately, this is what puts bread on your mouth.
But, really? Are you suggesting that Cambridge Analytica didn't happen? Did we all hallucinate that?
You guys jumped the shark already. These attempts at damage control are laughable.
We limit the information we share with Meta in important ways. For example, we will always protect your personal conversations with end-to-end encryption, so that neither WhatsApp nor Meta can see these private messages.
Facebook has a deep culture of pathological lying. They lied to the FTC [1]. They lied to WhatsApp and to the EU [2]. They created an Oversight Board and then lied to it [3].
Each of those lies are more substantial than lying in a privacy policy.
[1] https://www.ftc.gov/system/files/documents/cases/182_3109_fa...
[2] https://euobserver.com/digital/137953
[3] https://techcrunch.com/2021/09/21/the-oversight-board-wants-...
They can of course also have their app de-crypt and re-encrypt the messages to the key of a requesting third party like police or hired reviewers if certain keywords are used.
Authorities could also have Google or Apple ship a signed tampered Whatsapp binary to any user or group of users, like protestors, that uses a custom seeded random number generator so they can predict all encryption keys generated and no one else, including Meta, will know.
The variant of end to end encryption where third parties control the proprietary software on both ends, is called marketing.
As part of the Meta Companies, WhatsApp receives information from, and shares information (see here) with, the other Meta Companies. We may use the information we receive from them, and they may use the information we share with them, to help operate, provide, improve, understand, customize, support, and market our Services and their offerings, including the Meta Company Products. This includes:
- improving their services and your experiences using them, such as making suggestions for you (for example, of friends or group connections, or of interesting content), personalizing features and content, helping you complete purchases and transactions, and showing relevant offers and ads across the Meta Company Products
====
Popular theory is they can't see or store your messages, but can analyze them on the client and profile you (e.g. interested in brazil nuts)
I can perfectly mean just the audio exchanges when both parties talk.
Also: E2E does not imply necessarily that they do not know the key.
Where's the evidence? I don't know what ethos "Hacker News" is supposed to capture, but surely it's not superstition?
for a lot of people, no
> Meta, a company of unprecedented size that was built over monetizing your private data?
one of many companies, however "meta" does have the advantage that you can opt out of them, mostly.
> A company who's been caught in plenty of scandals (like Cambridge Analytic) about this exact sort of thing (violating their users' privacy)?
CA is interesting as it started out as an academic study, which was consented fully. CA then went on to scrape people's public profiles, which often included likes, friends, etc. This combined with other opensource information allowed them to claim to have good profiles of lots of people, the PR was strong. Should FB have had such an open graph? probably not. Should they have taken the rap for everything evil on the internet since 2016? no. There are other actors who are much more predatory who we should really be questioning.
> Are you Meta employees?
I think you place far to much faith into a company that is clearly floundering. Its not like it has a master plan to invade your entire life. Its reached it's peak and has not managed to find a new product, and is slowly fading.
However, as we all think we are engineers, we should really design a test! but first we need to be mindful of how people are tracked:
1) phone ID. If you are on android, your phone is riddled with markers. Apple, supposedly they are hidden, but I don't believe that they don't leak
2) account, and account is your UUID that tracks what you like.
3) your IP. if you have IPv6, perhaps you are quite easy to track. even on V4 your home IP changes irregularly and can be combined with any of the above to work out that you are the same household.
4) your browser fingerprint. (be that cookies, or some other method)
5) your social graph
method:
1) buy two new phones.
2) do not register them with wifi
3) create all new accounts for tiktock, gmail, instgram etc.
4) never log into anything you've created previously, or the fresh accounts on old devices.
5) message each other about something. However you need to source your ideas from something offline, like a book from a thrift store or the like. maybe an old magazine. open a page, pick the first thing your finger lands on. this will eliminate the "I heard about x" or "i'm in the mood for y"
report back.
Wait... If WhatsApp is really E2EE encrypted, why would any of the other steps be necessary? Dude and his wife can simply pick at page at random from a magazine in a store, never search anything online about it, start talking about it using WhatsApp as if it was something of great interest to them. If they start getting related ads, obviously something shady is going on. There's no need for new phones / new GMail accounts / etc.
I don't know what is happening in this specific case. Perhaps the ads came from some other similar search queries. Perhaps they came from the keyboard intercepting what was typed. Or perhaps something else that I can't think of. But I'm nearly certain it did not come from meta intercepting the contents of your messages.
It's hard to convince people at this point because many have lost trust in Meta as a company, and I understand that. But I still find it stunning that so many people are making so many false claims without any actual knowledge to back it up.
I didn't have in mind the scenario of a keyboard logging user inputs besides the normal functionality of WhatsApp. I find this theory to be very plausible. Not at all happy with Meta's privacy policy, but I agree that it is worth considering other threats.
From using a VPN that logs all incoming and outgoing traffic (NetGuard) on an Android One device, I've noticied that the default Google keyboard gets in touch way too many times with some distant servers. Whereas, an open source keyboard from F-Droid, FlorisBoard, does no snooping and gets updated solely through the app store.
Another consideration, there are companies that track and sell geolocation data. It's "anonymized" but so precise you know the street address a user resides at. It is not a stretch to consider "anonymized" retargeting from keyboard inputs.
I was dismissive of it in the past, as comments voted higher here are. However I've seen enough weird ads show up within minutes of making jokes about obscure topics that I suspect there is something going on.
The piece that might be missing here is third parties collecting signals, "anonymizing" them, and then ads get re-displayed through Facebook, Google, etc. It may not be the major ad platforms doing it directly. In theory this should be harder now with the iOS tracking restrictions.
For the skeptical, consider Avast's Jumpshot. Here millions of users thought they were protecting themselves when their raw browsing stream was being sold live to third parties. I They aren't the only company that has done that. https://www.theverge.com/2020/1/30/21115326/avast-jumpshot-s...
Proprietary encryption means users cannot verify or control the keys or the code that generates or uses the keys. The app can exfiltrate the keys or do any keyword processing on behalf of Meta as well which can include well intentioned features like forwarding paintext messages containing certain dangerous-seeming words to authorities or theoretically trusted third party review teams. Naturally they could also return -metrics- about frequency of word use back to Meta for ad targeting as well.
I too have been a champion of encryption and privacy at past companies only to have all my work undone and watch all the data become plaintext and abused for marketing by a new acquirer.
The only way end to end encryption solutions can avoid these types of abuses is when the client software is open source and can be audited, reproducibly built, and signed by any interested volunteers from the public for accountability.
Short of that it is really not that much different than TLS with promises Meta will not peek, at least not directly, today.
Given evidence at hand, it is hard to view Meta as anything but a bad actor.
Isn't this kind of splitting hairs? Does it matter if text information came from a "side channel"?
It seems like the promise Facebook makes is that 'your communication using whats app is secure,' that's certainly my interpretation of what "end to end encrypted" means. It is a promise of security. That means text is sacred and even text sent to giphy should be privileged from the ad machine.
The question being asked here is not "is it end to end encrypted?" It's "are my communications secure?" End to end encryption is just one element of that security.
WhatsApp architecture is designed with the assumption that the server could be compromised and yet such an event should not result in any message contents being revealed. Furthermore, the encryption function is designed to ratchet and rotate keys so that a leak of a key at a given point in time would not compromise past and future messages.
So yes, I have a strong sense of confidence that message contents are not exposed to Meta and, given the bar set by privacy reviews, I don't think Meta would do some backdoor workaround like scraping the contents off the device and sending an unencrypted copy. To be clear, my claims are specifically around message contents and when it comes to certain metadata (ex. the sender/receiver, the names of groups, etc) I don't recall the exact details of how they are treated.
Now, despite the fact that I've said all this and that my knowledge on the matter is fairly recent, I'm not sure I could ever say anything with absolute confidence. The code base is huge and not open source. I obviously have not seen every line of code and as you pointed out, there's always a chance some company policy changes happened without my awareness. So I would say "highly" confident but not "absolutely" confident.
Dead Comment
Meta has control over the app Sue uses. So they could send them to Meta unencrypted in addition to sending them to Joe in an encrypted fashion.
Or they just extract the relevant terms:
Sue->Joe: "Hello Joe, I'm so excited! We are going to have a baby! Let's call it Dingbert. You're not the father! Jim is. I hope you don't mind too much!".
Sue->Meta: "Sue will have a baby"
Insta->Sue: "Check out these cute baby clothes!"
Turns out they were also building a database of everyone's face so they could build shadow profiles...
> her Instagram was showing ads for a store that was selling the same type of puzzle
How did she take the pic ?
Or they can say, technically it wasn't a message before it was sent. The dictionary definition[1] even mentions "send".
[1] https://www.oxfordlearnersdictionaries.com/definition/englis...
No, that's precisely what End-to-End encryption means.
Facebook loves to use newspeak, wouldn't surprise me if they applied newspeak to what "end-to-end encryption" means.
The quandary of what one allows to run on those implants sounds like a chilling sci-fi novel (chilling not because "but FAANG could read your thoughts!" but because people would absolutely still get them installed).
https://en.wikipedia.org/wiki/End-to-end_encryption#Endpoint...
It's illustrated in their example below that they if you say you're having a baby, meta can send some type of distilled ad-keywords to its servers (eg `[mother, baby]` if it knows the user is a woman based on their name/profile, but probably more sophisticated than that). The message you sent is still technically end-to-end encrypted, though,
I think it does actually no one except them can read them. If someone else can, then by definition it's not end-to-end encryption.
From https://www.definitions.net/definition/End-To-End%20Encrypti...
> End-to-end encryption (E2EE) is a system of communication where only the communicating users can read the messages.
E2ee means only the messages themselves can't be intercepted and read. But if anyone can actually prove fb acting on message contents, I suspect the EU banhammer would be interested.
so what's the point? just inconvenience. better to use telegram at this point.
> We limit the information we share with Meta in important ways. For example, we will always protect your personal conversations with end-to-end encryption, so that neither WhatsApp nor Meta can see these private messages.
They are saying they dont store or forward your message text, not that your phone doesnt send them topics of interest
Assuming they're not blatantly violating the policy (which I think they've done before), it's pretty easy to weasel out of that statement by only sharing keywords from the conversation, or only sharing the info with advertisers (but not WhatsApp and Meta), or redefining what a "personal conversation" is, or carefully redefining what "end-to-end encryption" means, or ...
There's no transparency, a huge power imbalance, and terrific pressure on WhatsApp/Meta to monetize as much as possible.
I've always suspected them of recording conversations, also why I think Android has gradually tightened permissions and visibilty around speech to text/microphone/camera use.
Looking at this from a reality perspective is not very helpful.
* Have pairs of mobile devices set up from factory configuration with WhatsApp and Instagram installed.
* Simulate conversations between each pair from select topics.
* Collect all ads from Instagram after the WhatsApp conversations from each device.
* Categorize ads to broad topics.
* Search for significant bias.
There are probably a lot of factors I'm missing here, and it's probably easy to introduce bias when there is none there. For example it's probably a good idea that a different person categorizes the ads into topics than the person handling the specific phone, otherwise the person might bias the categorization of the ads based on the conversation they had on WhatsApp beforehand. The person categorizing the ads should have no knowledge of the WhatsApp conversation that happened on the phone. The devices should probably be on different networks. There is probably a lot that I am missing here.
Once or twice may be a coincidence. Maybe. But this happens regularly and with startling specificity.
What could be listening? I'm a technologist like the rest of you. I know apps need permissions to the mic, I know it's not easy for an app to stay in the foreground. Is it my Roku? My smart TV?
Makes one want to go full Richard Stallman.
p.s. my wife just said it would be really funny if Google News showed an article now on people worrying about their tech listening to their conversations. I'll post an update if that happens...
e.g. You and a bunch of friends go to dinner and have a conversation about <topic x>, at least one of those friends googles something about that topic. You later see an ad related to <topic x> because you were targeted based on the search your friend did while they were near you.
If your wife potentially did anything digitally, related to the diabetes topic, its likely that you were targeted based on that.
Again, no idea what happened resultant to the story you shared, but whenever this sort of thing happens to me I try to appeal to Occam's Razor based on how much I know about how this tech works under the hood.
Quoth Richard Feynman:
> “You know, the most amazing thing happened to me tonight... I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!”
You were gonna see ads, or news stories, or whatever. You happened to see this one. And it happened to connect to a conversation you were having. Amazing! What magic!?
Well how many other ads or other times didnt it connect to a conversation and you just don’t remember because it didn’t feel special? Probably many more times than it did work.
Lately I’ve been trying to find alternate explanations. Hm we were talking about going to a restaurant tomorrow. Why is my girlfriend getting all these restaurant ads on instagram all of a sudden? Oh right, because it’s Thursday, we live in a city, I searched Google for “Good Friday restaurants for a date” during our conversation, and we live on the same IP.
However, I also am reminded of Nassim Taleb's Fat Tony, who when asked 'what are the odds of flipping a coin heads 10 times in a row?' responds with 'fuggehdabowdit. it's a hustle.' There's a scientific response, which can be very naive in some ways, and there's the Fat Tony street analysis. As I've gotten older, I tend to value the latter.
So sweet boxes from Kazakhstan suddenly appearing in your ads is due to observation bias...
My mother recently remarked in a unique conversation that we have had a box of Golden Grahams cereal for a year and should find a recipe to use it up. She opened her phone to search, and lo and behold, the top recommendation after only two letters, R and e, was "Golden Grahams recipes". Not only had that never been a topic of conversation or search beforehand, nor did she have Google open on her phone, but you may have noticed that "Golden Grahams recipes" doesn't even start with R or e. This sparked a long conversation about how privacy really is something worth fighting for.
My only guess is that Google has the ability to listen in because we use Android phones.
It should be possible to run scientifically sound experiments to show if it's happening.
Deleted Comment
1. Nobody is reading your WA messages, the same topics can be learned from your browsing activity or other msgs, eg. by reading your sms texts.
2. Meta is reading your messages directly in-transit, server-side.
3. Meta is not reading your messages server-side, but the Meta apps extract keywords from your conversations and request relevant ads from the ad servers.
4. Another non-Meta app is doing the above.
5...
I think that 1 is the most plausible, however the original post is about "topics they never talk about", so assuming that WhatsApp is the only channel and they don't leak data in other ways (and there are many other ways to leak data), then 1 becomes unlikely.
3 is the most compatible. All the targeting can be done locally, so no end-to-end unencrypted message leaves the app. The app then sends your topics of interests to Meta.
4 again assuming WhatsApp is the only channel, then there is probably some malware somewhere, and it is unlikely that Meta accepts illegally collected data (they can do it legally, better, and with less potential trouble). There are however a few legitimate apps that can do the above. I am thinking about things like predictive keyboards, accessibility apps (screen readers, ...), backup apps (end-to-end encryption is about transmission, not storage), and the OS itself. I don't think Meta controls any of these, and I don't think they would buy data from them (Google and Apple are competitors after all).
So I would go for an accidental leak (case 1). For example, for the experiment to be meaningful, you shouldn't tell anyone about the test topic before you receive the ads. Or with the WhatsApp app hinting Meta about your topics of interest.
I agree, I'm pretty sure 2. is not the case; I just listed it as a theoretical possibility. Despite all the bad press and problems, FB has very (very) high integrity and standards, at least the parts I saw.
WhatsApp's end-to-end encryption is used when you chat with another person using WhatsApp Messenger. End-to-end encryption ensures only you and the person you're communicating with can read or listen to what is sent, and nobody in between, not even WhatsApp. This is because with end-to-end encryption, your messages are secured with a lock, and only the recipient and you have the special key needed to unlock and read them. All of this happens automatically: no need to turn on any special settings to secure your messages.
https://faq.whatsapp.com/791574747982248
This does not exclude an algorithm running running on the sender/recipient's App from scanning the content and sending suggestions to AD servers.
I guess the key lies in "what is sent" in the above statement. The casual reader might reasonably interpret as "no-one except the intended recipient can see _what I type_". But it doesn't say that. It only covers what gets _sent_. It doesn't say anything about what happens to the content outside specifically _sending_ it to the other party(ies).
The more obvious explanation is that before or after the e2ee (i.e within the app itself), an algorithm scans the content, categorizes it and sends this to Meta/Facebook.
In this scenario, *Nobody* has read the content other than the person you're communicating with.
Dead Comment
Basically, let the AI figure out what ads get clicked the most for a given string of encrypted 24h window of chat history. Eventually, the AI is going to hit on its “Rosetta Stone”, even without ever formally decrypting the text, much less any human reading it.
With millions of conversations happening on WhatsApp, why shouldn’t that be possible?
And it’s not even a breach, technically, because nothing ever got decrypted, and the similarity vector generated by the AI have, per se, nothing to do with the content of the conversation or the individual that sent them. Run the same training algorithm again and they’d look completely different! Hence they can’t possibly be “personal data” in the sense of the law.
If you can discern meaning from noise, then your theory would work. But discerning meaning from random noise is obviously impossible (i.e. what if there is no meaning?).
If you leak information than you say, then the encryption is worthless. Harmful, even, because you think you have protection when you do not.
Dead Comment