Tell HN: YC companies scrape GitHub activity, send spam emails to users

Martin from GitHub here. This type of behaviour is explicitly against the GitHub terms of service, when we catch the accounts doing this we can (and do) take action against those accounts including banning the accounts. It's a game of whack-a-mole for sure, and it's not just start-ups that take part in this sketchy behaviour to be honest. I've been plenty of examples in my time across the board.

The fundamental nature of Git makes this pretty easy for folks to scrape data from open source repositories. It's against our terms of service and those folks might want to talk with some lawyers about doing it - but as every Git commit contains your name and email address in the commit data it's not technically difficult even if it is unethical.

From the early days we've added features to help users anonymise their email addresses for commits posted to GitHub. Basically, you configure your local Git client to use your 'no-reply' email address in commits and that still links back to your GitHub account when you push: https://docs.github.com/en/account-and-profile/reference/ema...

I think that's still probably the best route. We want to keep open source data as open as possible, so I don't think locking down API's etc is the right route. We do throttle API requests and scraping traffic, but then again there have been plenty of posts here over the years from people annoyed at hitting those limits so it's definitely a balancing act. Love to know what folks here think though.

david_allison · 15 days ago

> when we catch the accounts doing this we can (and do) take action against those accounts including banning the accounts.

This isn't my experience. I requested that you looked into a spammer in July 2025, you ignored my reply and the account is still active.

----

Thank you so much for the report. We're sorry to hear you're receiving unwanted emails, but it's always a possibility when your public contact information is listed on the web. You can keep your email address private if you wish by following the steps here:

Setting your commit email address

We do expect our users to comply with our Terms of Service, which prohibits transmitting using information from the GitHub (whether scraped, collected through our API, or obtained otherwise) for spamming purposes. I'm happy to look into it further to see if we can contact the reported user and let them know that this type of activity is not allowed.

Please let us know if you have any other questions or concerns.

----

My reply which was ignored:

----

I understand it will happen from time to time. I'd rather be contactable (I've received legitimate emails today because my email is on my profile).

Please take further action. My email is public with the expectation that the ToS will be enforced. If GitHub isn't discouraging spammers then it makes it much harder to justify being contactable.

All the best, David

gettingoverit · 15 days ago

I reported spammers ~5 times to GH, and every time the account went down in a couple of hours. Obviously mileage may vary, but I don't want the whole HN to think this process is completely broken.

Please keep reporting spammers, usually it works.

tom_m · 15 days ago

It's impossible for them to stop if you list your email on there. They could make it harder of course. But if you put your email out there for a human to find, then a script or bot or also find it.

And yes of course they can also stop a specific spammer. But that spammer may pick up another account and email.

Aachen · 15 days ago

>> it's always a possibility when your public contact information is listed on the web

Sounds correct to me

> Please take further action. My email is public with the expectation that the ToS will be enforced.

What magic wand are you expecting they wave that distinguishes people who need your email address for legitimate from those who need it for illicit purposes? Why wouldn't we apply the same to the entire population and lock up criminals before they've committed crimes?

What you're asking is entirely impossible short of mandatory mind reading

Rapzid · 15 days ago

Yeah they likely rarely if ever "look into" it and certainly nobody has ever needed a lawyer over this.

As recently as a year or so ago, at least, you could list repo stargazers through their graphQL API and get a TON of email off that depending on the user settings.

retlehs · 16 days ago

I’ve made over five reports for this exact spam scenario, and never once have y’all acted on them. I have a hard time believing you ban spam accounts that clearly violate your ToS.

I even wrote about a specific example of a YC company spamming me from my GitHub email at https://benword.com/dont-tolerate-unsolicited-spam

eli · 16 days ago

How would you know whether the account that did the scraping was banned?

Aachen · 15 days ago

How did you connect joe@legitbusiness.com, where spam usually originates from for me (hacked email accounts), to a specific github user account that was used to scrape the data, which microsoft can choose to ban? And that's assuming they believe you're being truthful and not simply angry with the user whom you're reporting

koito17 · 16 days ago

I don't have any specific suggestions, but I do want to give thanks for implementing functionality to block pushes if the email field is *not* using an anonymized mail address.

It's one thing to offer anonymous e-mail addresses, but it's also awesome that GitHub can help prevent mistakes that would otherwise leak a user's e-mail address. I am not sure how many people try to be privacy conscious on GitHub, but I assume most users don't, so it's nice seeing this little feature exist.

dathinab · 15 days ago

It gets more complicated when commit signing, the widely broken web of trust (for the signing key) and similar are involved.

And not all devs want or need anonymity on github.

In general just because information is publicly accessible in some form doesn't make it okay or legal to abuse it (accessible doesn't mean any form of usage rights are transferred to you weather it's in context of GDPR or in context of copy right).

ayhanfuat · 16 days ago

I am also getting constant spam because apparently they can see who starred a repo (i.e. I see you starred repo x and we are doing something similar). I am not starring anything anymore.

skwashd · 16 days ago

I know it is against the ToS. I've reported multiple organisations doing this. Last time I reported one, support closed the ticket saying the activity is off platform so they can't do anything.

danesparza · 16 days ago

I didn't realize this was against the Github TOS - I just thought it was par for the course for recruiters nowadays. This is good to know!

How do I report that person, though? Your support page about reporting abuse assumes I know the person's Github account: https://docs.github.com/en/communities/maintaining-your-safe...

blobbers · 16 days ago

Scrape once, spam forever.

I think it's pretty clear you need to use an anonymization scheme in the way commits are handled so that it links back to your github account and the email addresses are kept private.

Privacy centric companies like Apple do this for users offering hashed emails, on a per login basis.

I'm sure this would not work in a world of scraping, but having that kind of ability to figure out bad actors would be nice. You could require authenticated users for certain kinds of requests, and block user information from non-authenticated requests.

david_allison · 15 days ago

They already do[0]

    62114487+david-allison@users.noreply.github.com

this includes a unique ID which survives account renames, and the name of the GitHub account at the time.

[0] https://docs.github.com/en/account-and-profile/reference/ema...

realityloop · 15 days ago

I've received several of these types of messages including Voice.ai one mentioned in comments, and the following today:

Tonho<tonho@tonho.wtf>

Hey, I found your GitHub profile and thought you might find this useful.

I've been building Omniget, a desktop downloader that works with YouTube, Telegram, Udemy, Hotmart and 1000+ other sites. It's open source and built with Rust and Tauri.

The part I'm most proud of: you don't even need to open the app. Just press a hotkey and it grabs whatever video you're watching.

I've been working on this for a while now, even got an artist to design a mascot. I'm shaping the app based on feedback from people who actually use it, so if you have any thoughts I'd love to hear them.

Here's the repo: https://github.com/tonhowtf/omniget

Thanks for your time!

Tonho

AznHisoka · 16 days ago

Maybe I am missing something, but can’t you simply not show the email address in a git commit? (Sincere question, not saying this is trivial. i am dumb and like to ask dumb questions even if might be embarassing)

If someone wants to message someone, it goes through github notifications or github emails them

Also banning an account doesnt seem like a heavy punishment, given they can simply move to gitlab, bitbucket etc

easton · 16 days ago

Git commits have a email address as a required field[0], although some people put something bogus in there. And then it's in the data provided when you clone the repo onto your machine even if you aren't using the GitHub APIs.

To his point, you can set that to the no-reply email address GitHub gives you if you don't want mail but do want the commit to be linked to your GitHub account.

[0]: https://git-scm.com/docs/git-commit#_commit_information

EdNutting · 16 days ago

That would be a fundamental change to how Git works, not just GitHub. Even if the web UI didn't show it, a simple `git log` would reveal it.

You can mask your email address in git commits but a lot of open source projects won't accept that. And some pseudo-open-source ones insist on sending you an email to authenticate before they'll give you access to the GitHub repo (looking at you Unreal Engine!)

So, no, I don't think they could simply "not show the email address".

miki123211 · 16 days ago

Git commits are identified by a hash of their entire contents[1]. The way hashes work, if you change even one bit, the hash becomes completely different. Every commit contains the email address of the committer and the hash of the parent commit. If the email address in even one commit is changed or removed, that changes its hash, which in turn requires you to update its children, changing their hashes etc. So, updating a commit from n years ago requires you to update all commits that have been made since. By default, git will refuse to pull from such an updated repository, as commits are considered immutable once pushed.

[1] In practice, it's a bit more complicated. Merkle trees are involved, so it's hashes of hashes of hashes instead of hashing a multi-gigabyte blob on each commit, but that's a performance optimization that doesn't affect semantics much.

dent9 · 16 days ago

You should be using the email address "username@no.reply.github.com" or similar

There's never been an obligation to use a real email address for git

just6979 · 15 days ago

What section of the ToS prohibits this? In other words, what is the thing that is being done that is against the ToS? Looking up the creator of a repo, or the contributors of the repo?

I did a quick scan of the ToS and all I could find was D8 that states that autmated access (scraping) used for "AI" applies a reciprocal license that prevents the scraper from restricting GitHub's access to the data (the whole model? the weights?) resulting from the scraping.

This makes it sound like any model trained on GitHhub content cannot be commercialized, because charging for access to the output would be a "technical or other limit"... So you're obviously not really enforcing this, otherwise MS would be suing every big commercial model out there!

wrs · 15 days ago

It seems like a safe assumption that the big commercial models will have negotiated their own private GitHub terms of service, especially considering their many-digit annual contracts with Azure.

shawmakesmagic · 15 days ago

FYI I get about 5 of these a week. It is pervasive. If someone wants to scrape my email that's one thing, but the number of recruiters who are like "I saw your repo <some ancient repo of mine> and I think you'd be a great fit for our new position in AI agents..." so they are both scraping my e-mail and all the metadata to personalize their pitch to me (poorly).

ericol · 16 days ago

I've had more than a few instances of this over the past 2 years, and my reply is exactly the above.

"What you are doing is against Github's TOS"

nickphx · 15 days ago

How about improving the processing of abuse reports for repos hosting windows malware that is actively being advertised to potential victims? https://github.com/preconfigured/dl/blob/main/ms-update32.ex...

TheSaifurRahman · 16 days ago

Are no-reply emails associated with the accounts if the username is changed? That's one reason why I switched back to my personal email.

martinwoodward · 15 days ago

Since 2017 they are yes.

Foxboron · 15 days ago

I have reported several spam emails to Github and from what I can tell none has been acted upon.

dent9 · 16 days ago

Amazon did this to me. Their recruiters started hounding me at an email address that I only ever used to sign git commits on some repos used on GitHub. When I asked them how they got my email address they said "it was in [our] database"

trympet · 16 days ago

Nice, thank you Martin. How do you punish the fraudsters? Do you send them to prison over CFAA violation terms of service?

martinwoodward · 16 days ago

I kinda wish I had that much power. There would certainly be less people in the world listening to their phones without headphones..

Usually starts with contacting them over email reminding them of the terms of service and warning them to stop. Then their account might get deactivated and they need to write and promise to not be naughty again. If they ignore that then the account gets removed.

There are a bunch of automated checks that are running all the time as well and will take automated action that then gets later reviewed by humans. At lot of times the process is fast-tracked.

The off-platform 'let's scrape a bunch of data and then spam nice people' is the hardest to police. Linking those mails to an offending GitHub account is hard and very manual, also anyone can send emails saying they are someone they are not and because of that anyone can deny they sent the mail and they'll usually blame a rogue agency they where working with etc.

I probably shouldn't say it, but the public shame that comes from being mentioned on social, in hacker news etc. That stops people who want to be treated as legitimate from doing that sort of thing and helps educate the wider community around what is and isn't acceptable behaviour - that is why it's good to see this thread and see the issue getting attention.

nerdsniper · 16 days ago

> CFAA violation terms of service

This would be a gross miscarriage of justice and bringing successful action under this theory would do widespread harm by expanding the definition of the CFAA.

Just because a company can take some nuclear action, doesn't mean they should.

skeptic_ai · 16 days ago

Will send a strong email: Don’t do bad things.

miki123211 · 16 days ago

I've raised this as ticket ID 4114793, just in case.

blibble · 15 days ago

> it's not technically difficult even if it is unethical.

kettle, pot, black?

I received the following offical spam last week from GitHub:

> Build AI agents with the new GitHub Copilot SDK

despite never granting consent for marketing material

(and yes, there's a GDPR complaint now working its way through the national regulator)

moomoo11 · 16 days ago

Ban them. Honestly I get the same and it is beyond frustrating.

I will pay more for GitHub if you go hard on these mfs.

observationist · 16 days ago

Hey, Martin - https://github.com/lucidrains

Mind fixing lucidrains account? Something happened without notice or recourse. He's one of, if not the most well known open source AI researchers on the planet, with implementations and explanations of papers and ideas that are wonderful. If you could bring some sanity to that situation and take it out of whatever kafkaesque account purgatory it fell into, you'd be doing the work of angels.

Thanks!

davnn · 16 days ago

What was happening with this account? I was often seeing popular but empty (only title of the paper and maybe a short readme) repositories that were created directly after a paper was published?

nextaccountic · 15 days ago

Is this mirrored on gitlab or somewhere else? Nobody should trust Github to store all their data

I also had unsolicited spam from Vincent Jiang of Aden, another YC company.

    Hi Daniel,

    I just came across your profile on social media and wondered if you'd be interested in joining our Discord community for AI agent development. Currently, we see that agents break, loop, get lost, hallucinate, and cost a fortune, and therefore built a space where developers can share challenges and insights.

unfunco · 16 days ago

…and more from Backdrop.

    Hi Daniel, I found your GitHub profile while searching for anthropic projects, and got your email from your profile.

    I'm part of an online program for builders called Backdrop Build, and I think that program would be a great fit given what you are building. We have a track for builders in AI like you, it's fully online/remote and costs nothing to participate. It also works if you have a day job, it's light on time and perfect for side projects!

And then another after I marked the first one as spam and ignored it.

    Checking in one last time to see if you have any questions about the program or the application. If it's not for you, all good - just ignore the email because I won't be pinging you again :)

   Joey from Backdrop

Both companies have guaranteed that I won't use their services nor procure them for any organisation I work for.

agmater · 16 days ago

Hey it's Joey checking in again. We noticed you mentioned our company, let me know if you have any questions about our (free!) program. I'll go ahead and email you some more info, just in case.

foldr · 16 days ago

I had a similar one from that guy asking me to make open source PRs to some repo of theirs for, err, $25-50/hour. I replied explaining that senior software engineers in the UK aren’t quite as desperately poor as that, and got a canned response saying that they were looking forward to reviewing my PRs :D

shunia_huang · 15 days ago

Blows my mind that you guys are so expensive lol.

anezjonathan · 8 days ago

Hi git,

I just came across your profile on social media and wondered if you'd be interested in joining our Discord community for AI agent development. Currently, we see that agents break, loop, get lost, hallucinate, and cost a fortune, and therefore built a space where developers can share challenges and insights.

So far, more than 8,000 members are sharing technical insights, agent templates, and tooling on a daily basis. We're also open-sourcing new toolings based on everyone's feedback.

Let me know if you are interested in joining; you can find the Discord link invite on our site (Adenhq).

Best, Bryan Zhang

Hive - Framework for autonomous, adaptive agent Bryan Zhang Co-founder and CTO If you're not interested, please feel free to ignore this message; no further contact will be made.