Readit News logoReadit News
Posted by u/jgwil2 3 years ago
Ask HN: Google spam filter getting worse?
I have noticed an uptick in uncaught phishing messages in the past few months, and talked about it to a friend who observed the same. Anyone else?
tvanantwerp · 3 years ago
For months now, emails with subjects like "MCAfeeconfirmati0n--#21845315" and "confirmation#4073301981" have been hitting my inbox. These are such obvious spam emails that I'm unsure how the spam filters aren't catching them. Reporting them as spam hasn't done anything to catch them.
lame-robot-hoax · 3 years ago
I have this same problem with Outlook. Starting probably 2-3 months ago I began receiving somewhere from 5-10 spam emails with titles like this a day directly into my inbox. Reporting them as spam helped a little and brought it down to maybe 1-5. But they’re obviously spam with subjects like Norton Confirmation, OuOrtIBGGvGIO, Life Insurance Offer, etc. with weird fonts and other stuff.

As a side note, a lot of these spam emails I get are from Gmail.

jeffbee · 3 years ago
Judging from my own spam label on gmail, those messages are part of the torrent of junk that is pouring out of Microsoft's "hybrid on-premises exchange" egress VIPs. Basically some clown who pays Microsoft for quasi-hosted Exchange has a virus that sends spam, and Microsoft blesses it with the reputation of the customer egress addresses. Eventually, this will stop working for Microsoft but at this time it's like waiting for Greenland to melt: inevitable, but takes a long time.

Also worth noting if you are trying to evaluate gmail's classification performance that the vast majority of what they think was spam is not in your spam label, it got stopped with a 4xx error code at SMTP time. So you don't really have a way to know the denominator.

jamespo · 3 years ago
Ironically Microsoft are the only major MX that won't accept email from my server.
eightysixfour · 3 years ago
Funny. I'm on Outlook and mine is (sort of) the opposite, most of the spam that comes through is @gmail.com these days. Seems like spammers are taking advantage of known trusted relationships between services to increase delivery rates to specific domains.
pwarner · 3 years ago
Seeing the same. Someone from Google please fix this. I've gone from one spam a month to several a day. I've been using Gmail since the beta.
r1ch · 3 years ago
They're multi part which seems to trip up Gmail, it seems one part is scanned and another displayed. Base64 decode the source parts and add a keyword filter for the "non-spam" text as it's usually pretty static.
xeromal · 3 years ago
Yeah, it's been happening to me for about a year now. I went as far as to make another email just to avoid it. Made me sad. I had that email address since 2008 or so.
adamckay · 3 years ago
I had exactly this yesterday, only the email address was my own Gmail with a dot at the end so when I opened the email the name was "McAfeeSecurity" with my own email address and profile picture.

I reported it as spam and Gmail helpfully asked if I'm sure because I communicate with this person a lot and when confirmed said it will block the sender. Unsure if this will have any impact on the emails I send out myself now.

Rather worrying that Gmail addresses can be spoofed.

brianjking · 3 years ago
Same here, it's so bad.

Dead Comment

cochne · 3 years ago
Google probably lets some amount of known-spam emails through for data gathering. See this quote from Google's "Rules of Machine Learning" [1] (A great resource by the way)

> Rule #34: In binary classification for filtering (such as spam detection or determining interesting emails), make small short-term sacrifices in performance for very clean data.

> In a filtering task, examples which are marked as negative are not shown to the user. Suppose you have a filter that blocks 75% of the negative examples at serving. You might be tempted to draw additional training data from the instances shown to users. For example, if a user marks an email as spam that your filter let through, you might want to learn from that.

> But this approach introduces sampling bias. You can gather cleaner data if instead during serving you label 1% of all traffic as "held out", and send all held out examples to the user. Now your filter is blocking at least 74% of the negative examples. These held out examples can become your training data.

> Note that if your filter is blocking 95% of the negative examples or more, this approach becomes less viable. Even so, if you wish to measure serving performance, you can make an even tinier sample (say 0.1% or 0.001%). Ten thousand examples is enough to estimate performance quite accurately.

[1] https://developers.google.com/machine-learning/guides/rules-...

andrewflnr · 3 years ago
I don't think that explains the very obvious crap that gets through, for instance several near duplicate spams in a row, each of which I manually reported.
mtmail · 3 years ago
Gmail is the prime target for all spammers. I see regular reports, also for Google Search results. Nobody has an answer really.

3 days ago "Tell HN: Gmail's spam filters have gone bonkers" https://news.ycombinator.com/item?id=34411009

1 month go "Ask HN: Do you all get spam in Gmail daily?" https://news.ycombinator.com/item?id=34093812

4 month ago "Ask HN: What's happening with Gmail spam filtering?" https://news.ycombinator.com/item?id=32923098

"Ask HN: Is Gmail spam out of control for everyone else too?" https://news.ycombinator.com/item?id=30315116

gnabgib · 3 years ago
These dupes are getting tedious, largely the same comments as the one 3 days ago (94 comments, 127pts). People agree, other mail hosters say most of their spam comes from gmail and outlook, various folk point out they've switched to competitors and it's much better (for now)
jgwil2 · 3 years ago
Thanks for posting. I didn't realize that this had been such a hot topic recently.
nanidin · 3 years ago
I run my own mail server + spam filter, so I'll chime in. I have seen a high uptick in spam making it to my inbox in the last two weeks. I primarily rely on Spamhaus blocklists + a Bayesian filter trained on old spam.

The uptick I have seen is going from 0-2 spams making it to my inbox to 10-20 spams making it to my inbox. When this has happened in the past, I have assumed it is spammers bypassing blocklists by finding new hosts, or by spammers finding a clever way to beat the filter. Usually after these big upticks, they drop off again suddenly, which makes me believe that it was a blocklist bypass and not a filter bypass (my filter is pretty weak and hasn't been retrained/updated in many years.)

tracker1 · 3 years ago
Given all the news about hacks with self-hosted Exchange, more likely they're relaying through hosts with a built up trust... As good as Exchange + Outlook are as a user, it is pretty painful to see exploits in the wild like this.

The whole system just sucks as a whole, and feels too entrenched to come up with something better. Even a notify+pull system wouldn't fix these kinds of exploits, even if they would correct end-user breaches.

muppetman · 3 years ago
I use rspamd for my self-hosted mail and I still don't really see any spam at all. I've spent quite a bit of time tuning it (ensuring that domains I expect mail from are trusted, mostly) but I can't believe how GOOD it is.
Y_Y · 3 years ago
Spam is not countable
andrewmcwatters · 3 years ago
Yes, it's been measurably worse for somewhere on the order of months to years now.

I'm not sure what they've changed internally, because if they have talked about their engineering strategy for spam detection (which I doubt, since it's probably asymmetric information), no one has shared writings about it.

Nevertheless, I get obvious spam in my inbox now, and important email occasionally goes straight to my spam filter now.

People here on HN have been speculating that they moved to some sort of machine learning model, probably because employees were incentivized to pervert the existing product for promotion purposes by gaming internal metrics to prove they've had an impact.

maicro · 3 years ago
Another anecdotal datapoint, but - I haven't noticed an uptick in actual spam making it to my primary inbox. I can't give solid numbers, but it's not been bad.

This includes a marked increase in crypto spam/phishing emails due to the cointracker email list breach - those have pretty much exclusively gone straight to Spam (including those using Google Sheets so it has an official Google sender email).

Again, just an anecdote, and I don't doubt that you and anyone else reporting an increase is experiencing it.

Deleted Comment

georgel · 3 years ago
I have a month old business email for my new company setup with GSuite and Google's own on-boarding emails went directly to spam in that inbox. I haven't marked any emails as spam with this new account yet.
DownGoat · 3 years ago
I have been getting tons of PDFs which in the previews shows pictures of women. The subject and body of the emails just seems to be random words like in a seed phrase, and with some random single digit numbers. The email is sent from office, hotmail or gmail accounts and verifies. The TO field is also filled with other emails. I have been getting this for like 3 or 4 months, and report as spam does not work. In all the years I have had a gmail account it has never really been a problem.
bluedino · 3 years ago
Microsoft has the problem as well, it's not just Google. Do they not filter outgoing?

  Message ID <9UOejz_TlFksgoyXm9GI5Q@notifications.google.com>
  Created at: Fri, Jan 20, 2023 at 9:14 AM (Delivered after 0 seconds)
  From: "Girl Shows Girl cast a lookSTART JOIN Muriel (Classroom)" <no-reply@classroom.google.com>
  To: XXXXXXXXX
  Subject: Class invitation: "Check Join now View gambling Babe amidcustity"
  SPF: PASS with IP 209.85.220.69 Learn more
  DKIM: 'PASS' with domain google.com Learn more
  DMARC: 'PASS' Learn more


  Message ID <DM6PR18MB3569050DD20FD0372DA98C9DCEC59@DM6PR18MB3569.namprd18.prod.outlook.com>
  Created at: Fri, Jan 20, 2023 at 4:50 AM (Delivered after 3 seconds)
  From: hoven patroo <hovenpatrool@hotmail.com>
  To: XXXXXXXXXX
  Subject: 名梦 t94396350
  SPF: PASS with IP 40.92.18.30 Learn more
  DKIM: 'PASS' with domain hotmail.com Learn more
  DMARC: 'PASS' Learn more
You would think they'd do some basic bayesian filtering. This was stuff we fought in 2002.

jeffbee · 3 years ago
The first one is generated by apparent user actions from paid organizations. Although it's clearly spam, you can see how this is difficult for a provider to tackle, because all of the superficial signals are good: authenticated user, paid account, using official APIs. Obviously they need to step up their defenses against abuses like sharing from docs, calendar, etc to stop bad actors from laundering their spam through Google's highest-reputation internal senders.

When I worked in this area of gmail we called this the "russian urologist" problem. How do you correctly classify traffic like this when hypothetically some of your customers want to send and receive messages about viagra in russian? Casual observers will say that is spam but not to the russian urologist.