GitHub push protection is free for all public repositories

greysteil · 3 years ago

I worked on this before leaving GitHub a couple of months ago. It’s awesome. This release is a repo-level setting, which is nice, but it will be even more useful when the team releases a user-level setting in June/July. That will allow you to configure GitHub to (softly) prevent you from pushing any easily identifiable secrets to any public repo. The plan is for it to be on by default.

For context, about 200 new GitHub personal access tokens (PATs) are exposed in public repos every day, together with many more tokens from other providers. GitHub automatically revokes the PATs it finds, and notifies many partners if/when keys to their services are found, but we always felt it would be better to prevent the leaks from happening in the first place.

igetspam · 3 years ago

How does it compare to Trufflehog?

greysteil · 3 years ago

Same goal, different strengths / weaknesses.

GitHub has really focussed on preventing credential leaks. It's particularly good at scanning for highly identifiable patterns and preventing pushes that include them. That makes sense for GitHub: they're in the best position to prevent leaks (by rolling out push protection to all users) and they're big enough to influence the industry to switch to using highly identifiable patterns for API keys. However, it's at the expense of scanning for unstructured secrets (like passwords) where GitHub isn't as deep yet.

TruffleHog has focussed on scanning for credentials _after_ they've leaked. They scan for a broader range of things (including unstructured secrets like passwords). That naturally has a higher false positive rate, which they combat by automatically verifying some of their findings (by making requests to the corresponding services). GitHub does that too (for patterns it can't push protect) but it hasn't gone as deep on it yet. The delta is relatively small, though - as you can imagine, it's a long tail of patterns / credential types.

Right now there's space for both solutions - you want prevention when you can get it (without creating a bad developer experience with false positives), but you also want breadth. In the long run, though, GitHub is probably better positioned to offer both.

darthbanane · 3 years ago

I don't get it. If github declines the push then the blob must have already crossed the internet?

The message says to remove the secret from the commit but the actual action to take would be to rotate the secret since it's been exposed to github, no?

glitchc · 3 years ago

A Github PAT being exposed to Github is not the problem. That is, in fact, intended behaviour. A Github PAT being exposed to the internet is something else entirely, and likely to be an accident in most cases. That's what thd protection's for.

darthbanane · 3 years ago

For PAT ok but surely this also scans for aws credentials etc, or is it really just about PATs?

nickelpro · 3 years ago

Better than it being exposed to the entire world

Also I feel fairly confident Github/MS aren't about to change their business model to become a blackhat hacking collective

7znwjshsus · 3 years ago

That's not the risk. The risk is that Github has lackluster permissions and audit trailing and an employee could leak and sell keys. Or that they log keys and someone hacks their logs.

Rotating the secret is 100% the correct thing to do in this case.

darthbanane · 3 years ago

Yeah definitely better than allowing the push. But I feel they should also at least recommend rotating the secret

awesome_dude · 3 years ago

I think that the main benefit here is that the credentials aren't published for all and sundry to see.

The scanner has seen the credentials, yes, and it's then up to the individual to decide if that credential should be considered "compromised" or not (seeing as the Github scanner has seen that credential)

It's a step up from - oh sh*t everyone can see it and the user isn't even aware that they did the dumb

darthbanane · 3 years ago

I agree but according to their goal of empowering developers with security awareness they should make it more clear that this is a server-side check and that the credentials were exposed in plain text, just not to the general public.

The screenshot says just amend the commit and all's good

tetha · 3 years ago

I agree. I'd say this offers two good things though:

First off, it very directly informs you by interrupting your workflow. The secret doesn't go out and nothing happens - your dang git push doesn't work for some reason. This means you notice the leak earlier.

And additionally, it limits the exposure of the secret, which buys you time for the rotation. If you find some important credential in a public repository on the internet a few days or weeks after it was exposed, it's time to scramble to rotate the secret and spend the next few days picking up the pieces and putting systems back together.

If the secret has been exposed to a somewhat reputable entity or an entity you have a business relationship with, you can most likely take a day to plan the rotation and executed it. We've had this a few times during on-prem maintenance of customer systems or support calls with customers. Copy the wrong thing, paste the wrong thing, whoops we have the password for the superuser of your database cluster. It certainly enforces a rotation of that password, but with the business relationship there, it doesn't have to happen head over heels.

Chico75 · 3 years ago

Nice move to protect public repos.

For all other private repos and internal git servers, you can assume that credentials are routinely exposed if there is no pre-receive hook checking for secrets. We experimented with all the existing tools but none of them worked well enough so we built our own. Looking back we would have saved ourselves a lot of time and effort if we went with commercial offering like GitGuardian instead.

pabs3 · 3 years ago

I wonder what tool(s) GitHub are using for the secret scanning.

greysteil · 3 years ago

It’s a bespoke scanning setup designed to deal with GitHub’s scale, minimise false positives, and scan fast enough to be in the `git push` request/response cycle. Under the hood it’s using Intel’s hyperscan as the regex engine.

https://github.com/intel/hyperscan

pabs3 · 3 years ago

Any idea if they plan to open source it or part of it?

mennaali · 3 years ago

I wish these were not add-ons for private repos.

manojr13 · 3 years ago

It is open for public repos, so practically free