I worked on this before leaving GitHub a couple of months ago. It’s awesome. This release is a repo-level setting, which is nice, but it will be even more useful when the team releases a user-level setting in June/July. That will allow you to configure GitHub to (softly) prevent you from pushing any easily identifiable secrets to any public repo. The plan is for it to be on by default.
For context, about 200 new GitHub personal access tokens (PATs) are exposed in public repos every day, together with many more tokens from other providers. GitHub automatically revokes the PATs it finds, and notifies many partners if/when keys to their services are found, but we always felt it would be better to prevent the leaks from happening in the first place.
GitHub has really focussed on preventing credential leaks. It's particularly good at scanning for highly identifiable patterns and preventing pushes that include them. That makes sense for GitHub: they're in the best position to prevent leaks (by rolling out push protection to all users) and they're big enough to influence the industry to switch to using highly identifiable patterns for API keys. However, it's at the expense of scanning for unstructured secrets (like passwords) where GitHub isn't as deep yet.
TruffleHog has focussed on scanning for credentials _after_ they've leaked. They scan for a broader range of things (including unstructured secrets like passwords). That naturally has a higher false positive rate, which they combat by automatically verifying some of their findings (by making requests to the corresponding services). GitHub does that too (for patterns it can't push protect) but it hasn't gone as deep on it yet. The delta is relatively small, though - as you can imagine, it's a long tail of patterns / credential types.
Right now there's space for both solutions - you want prevention when you can get it (without creating a bad developer experience with false positives), but you also want breadth. In the long run, though, GitHub is probably better positioned to offer both.
I don't get it. If github declines the push then the blob must have already crossed the internet?
The message says to remove the secret from the commit but the actual action to take would be to rotate the secret since it's been exposed to github, no?
A Github PAT being exposed to Github is not the problem. That is, in fact, intended behaviour. A Github PAT being exposed to the internet is something else entirely, and likely to be an accident in most cases. That's what thd protection's for.
That's not the risk. The risk is that Github has lackluster permissions and audit trailing and an employee could leak and sell keys. Or that they log keys and someone hacks their logs.
Rotating the secret is 100% the correct thing to do in this case.
I think that the main benefit here is that the credentials aren't published for all and sundry to see.
The scanner has seen the credentials, yes, and it's then up to the individual to decide if that credential should be considered "compromised" or not (seeing as the Github scanner has seen that credential)
It's a step up from - oh sh*t everyone can see it and the user isn't even aware that they did the dumb
I agree but according to their goal of empowering developers with security awareness they should make it more clear that this is a server-side check and that the credentials were exposed in plain text, just not to the general public.
The screenshot says just amend the commit and all's good
I agree. I'd say this offers two good things though:
First off, it very directly informs you by interrupting your workflow. The secret doesn't go out and nothing happens - your dang git push doesn't work for some reason. This means you notice the leak earlier.
And additionally, it limits the exposure of the secret, which buys you time for the rotation. If you find some important credential in a public repository on the internet a few days or weeks after it was exposed, it's time to scramble to rotate the secret and spend the next few days picking up the pieces and putting systems back together.
If the secret has been exposed to a somewhat reputable entity or an entity you have a business relationship with, you can most likely take a day to plan the rotation and executed it. We've had this a few times during on-prem maintenance of customer systems or support calls with customers. Copy the wrong thing, paste the wrong thing, whoops we have the password for the superuser of your database cluster. It certainly enforces a rotation of that password, but with the business relationship there, it doesn't have to happen head over heels.
For all other private repos and internal git servers, you can assume that credentials are routinely exposed if there is no pre-receive hook checking for secrets. We experimented with all the existing tools but none of them worked well enough so we built our own. Looking back we would have saved ourselves a lot of time and effort if we went with commercial offering like GitGuardian instead.
It’s a bespoke scanning setup designed to deal with GitHub’s scale, minimise false positives, and scan fast enough to be in the `git push` request/response cycle. Under the hood it’s using Intel’s hyperscan as the regex engine.
For context, about 200 new GitHub personal access tokens (PATs) are exposed in public repos every day, together with many more tokens from other providers. GitHub automatically revokes the PATs it finds, and notifies many partners if/when keys to their services are found, but we always felt it would be better to prevent the leaks from happening in the first place.
GitHub has really focussed on preventing credential leaks. It's particularly good at scanning for highly identifiable patterns and preventing pushes that include them. That makes sense for GitHub: they're in the best position to prevent leaks (by rolling out push protection to all users) and they're big enough to influence the industry to switch to using highly identifiable patterns for API keys. However, it's at the expense of scanning for unstructured secrets (like passwords) where GitHub isn't as deep yet.
TruffleHog has focussed on scanning for credentials _after_ they've leaked. They scan for a broader range of things (including unstructured secrets like passwords). That naturally has a higher false positive rate, which they combat by automatically verifying some of their findings (by making requests to the corresponding services). GitHub does that too (for patterns it can't push protect) but it hasn't gone as deep on it yet. The delta is relatively small, though - as you can imagine, it's a long tail of patterns / credential types.
Right now there's space for both solutions - you want prevention when you can get it (without creating a bad developer experience with false positives), but you also want breadth. In the long run, though, GitHub is probably better positioned to offer both.
The message says to remove the secret from the commit but the actual action to take would be to rotate the secret since it's been exposed to github, no?
Also I feel fairly confident Github/MS aren't about to change their business model to become a blackhat hacking collective
Rotating the secret is 100% the correct thing to do in this case.
The scanner has seen the credentials, yes, and it's then up to the individual to decide if that credential should be considered "compromised" or not (seeing as the Github scanner has seen that credential)
It's a step up from - oh sh*t everyone can see it and the user isn't even aware that they did the dumb
The screenshot says just amend the commit and all's good
First off, it very directly informs you by interrupting your workflow. The secret doesn't go out and nothing happens - your dang git push doesn't work for some reason. This means you notice the leak earlier.
And additionally, it limits the exposure of the secret, which buys you time for the rotation. If you find some important credential in a public repository on the internet a few days or weeks after it was exposed, it's time to scramble to rotate the secret and spend the next few days picking up the pieces and putting systems back together.
If the secret has been exposed to a somewhat reputable entity or an entity you have a business relationship with, you can most likely take a day to plan the rotation and executed it. We've had this a few times during on-prem maintenance of customer systems or support calls with customers. Copy the wrong thing, paste the wrong thing, whoops we have the password for the superuser of your database cluster. It certainly enforces a rotation of that password, but with the business relationship there, it doesn't have to happen head over heels.
For all other private repos and internal git servers, you can assume that credentials are routinely exposed if there is no pre-receive hook checking for secrets. We experimented with all the existing tools but none of them worked well enough so we built our own. Looking back we would have saved ourselves a lot of time and effort if we went with commercial offering like GitGuardian instead.
https://github.com/intel/hyperscan
Deleted Comment
Deleted Comment