Secret scanning is now available for free on public repositories

flatiron · 3 years ago

Good on them. GitHub secrets cause a lot of problems. They will always create a better idiot but this idiot trap is long past due.

I also can’t wait until people base64 their creds to get past this. Explaining to some that base64 isn’t encryption tends to be hard so I imagine people will feel safe just base64 and checking it in.

banana_giraffe · 3 years ago

base64 is far too much work. A new dev turned '"AKIAIOSFODNN7EXAMPLE"' into '"AK" + "IAIOSFODNN7EXAMPLE"' to make the security alert go away.

Thankfully, the alert was sent to enough people it was caught by someone else, and the key was destroyed before someone outside could have fun with it.

mbildner · 3 years ago

I remember reading in jshint’s docs that they purposely did not chase this kind of lint since at that point the user is clearly fighting the library.

bagels · 3 years ago

Why was that change made? I would assume malice or incompetence?

reilly3000 · 3 years ago

Given the fact that nobody really does that, I think it was a creative and low risk hack.

1. If you are worried about the people who have access to your codebase abusing a secret, you have a serious people problem that needs to be solved immediately and unambiguously. A motivated internal attacker can do almost anything. Organizations live or die on trust. One doesn’t need to scour for keys break in when they have a badge (or their mate’s) and they built the lock.

2. If you are concerned the secret will be discovered by a generic threat, it won’t, not with this string concatenation. It’s so rare. Should this become a common practice this would be over, retroactively even. We all saw this unfold with with m y e m a I l at y dot com obscurity, until the fine folks who worked on ScrapeBox turned up the right regex to start scraping those too.

3. Nothing else. You are right and responded right. Don’t put keys in code kids. Nobody likes having to erase history in their codebase, especially because of a careless mistake or a deliberate workaround.

I’m just saying… of all the dumb shortcuts that can break stuff, this one is one the mostly harmless end of the spectrum.

depereo · 3 years ago

Base64: if this format wasn't secure why are kubernetes 'secrets' stored in it huh?

;)

woutr_be · 3 years ago

I legitimately recently had to argue with a PM and his developers, that a base64 encoded user ID isn’t considered security best practices for API authentication. Even when I showed them how I can produce the “secret” myself, they kept arguing that I was wrong.

bryanrasmussen · 3 years ago

Ok I've been around for a long time and I don't think I would have met anyone who would have argued that since around 2009, although I guess I can remember secrets and keys going into repos as late as 2018 at places I've seen.

CSSer · 3 years ago

Or md5. I wouldn’t be surprised to see that from some in PHP land.

Godel_unicode · 3 years ago

With md5 hashes, the actual password isn’t there whereas base64 encoding is merely another way of representing the same bits. Yes md5 is weak, but it’s Fort Knox compared to base64.

tkanarsky · 3 years ago

Hah, provide the hash and have the backend crack it whenever it needs to call the api.

dinvlad · 3 years ago

Does secret scanning also apply to public GitHub Action logs and Issues (or more generally, Checks logs)?

We found Action logs to be a much bigger threat now that many folks have learned not to embed secrets directly into the code and to use secret managers instead. But even then, the secrets retrieved in a step can be printed in plaintext if someone, for example, runs that step debug mode.

Issues can also accidentally leak secrets via for example, third-party code builders that print their output in an issue.

greysteil · 3 years ago

GitHub PM here. Right now we scan code, commit metadata, issues, and issue comments. We're expanding to other content types over time, with support for pull request bodies and comments coming in early 2023. Actions logs are on our list too, but will take a little longer.

(It's worth noting that any secrets in your Actions secret store will already be redacted in any Actions logs, so those won't leak there.)

lozenge · 3 years ago

It feels like there could be a GitHub action step that just means "redact this particular string output in this task and for the rest of the Action"?

dinvlad · 3 years ago

Thanks - and yes, this is meant for external secret management solutions like Vault, not GitHub Secrets, which are "safe" enough.

runlevel1 · 3 years ago

Searching for creds can be tricky if they can't be readily distinguished from other text.

Can anyone think of a problem with generating customer API keys that have a known prefix that makes them more detectable?

For example, a key like "FooSecret.ZTNiMGM0NDI5OGZjMWMxNDlhZmJmNGM4OTk2ZmI5". I wouldn't think that'd open up any new attacks, but I'm no expert on the matter.

greysteil · 3 years ago

GitHub PM here. We switched our own token format to something similar to the above in April of last year and have been encouraging other service providers to do the same.

The big benefit of highly identifiable tokens is not just that we can alert on them, but that we can scan for them at pre-receive time and prevent them from leaking (by rejecting the push). We already have that functionality as part of GitHub Advanced Security, and are planning to make it available (for free) on public repos in 2023.

[1] https://github.blog/2021-04-05-behind-githubs-new-authentica...

ericpauley · 3 years ago

One of the big issues with secret scanning now is that it’s opt-in from platforms, and from the list of supported platforms it seems like small ones may not be able to be included.

The holy grail here would be to introduce a standardized token format that encodes a disclosure endpoint. Then platforms can issue tokens to this standard and receive notifications without needing to explicitly opt in.

dottedmag · 3 years ago

Is there a way (or a plan to have a way) to register custom prefixes to have them scanned on a specific repository?

remram · 3 years ago

I argued for something like that previously on HN, like adding a domain prefix 'myservice.com_secretkeyhere'. This would allow automatic discovery of the reporting/revocation endpoint from the key. Then someone pointed out that you could just use an actual URL as your secret key and have that be the URL you visit to revoke it, and I think that is genius.

Next service I make that has API keys, I will make them look like `https://secret.myservice.org/ZTNiMGM0NDI5OGZjMWM`. POSTing to that URL revokes the key, a GET shows a form explaining what it is and a button to revoke the key.

One issue is that some email services mangle URLs specifically, and that would be bad for keys.

(edit: sudhirj is the genius: https://news.ycombinator.com/item?id=28299624)

nikeee · 3 years ago

GitHub does this with the tokens they issue. They even have a checksum in the token, so they can check if the token is syntactically valid:

https://github.blog/2021-04-05-behind-githubs-new-authentica...

They have a list of supported secrets they can find via automated scans:

https://docs.github.com/en/code-security/secret-scanning/sec...

tnorthcutt · 3 years ago

This is exactly what Stripe and I believe some other companies do, partially for this exact reason.

letmeinhere · 3 years ago

Another approach is to identify likely ones based on the entropy of the strings. I used a tool that did precisely this once and found some, but can't find it anymore.

dghlsakjg · 3 years ago

Not much of a downside, but it means that they are really easy to detect for attackers as well.

Really easy to just grep through something looking for that prefix

jamesfinlayson · 3 years ago

Yep - this way my thought. I've been bitten by this is in the past: an endpoint was throwing an error and dumping out environment variables (which included API keys) - the prefixed API keys were found by crawlers and abused but the unprefixed keys were untouched (though they were cycled just in case).

evdubs · 3 years ago

Do you think they call this service their "Secret Scanta"?

heelix · 3 years ago

We use this at our company. Wildly successful at finding tokens for most of the usual suspects. If they are including secret blocking - it will prevent someone from doing the dumb as well.

One question/behavior - if the secret scanner found something and folks resolved it -> secret blocking is enabled -> and a developer does the dumb again, should it block the PR with the new secret? Wondering if we might have something misconfigured as I have seen new secrets get added after we enabled blocking.

aashah-gh · 3 years ago

Hello! I am an engineer on the Secret Scanning team, thanks for the kind words!

- "push protection" (as we call it) isn't available for free, and isn't part of this rollout.

- For folks who do pay, the flow may be: a developer tries to push, they bypass the secret, are now able to push. From there, an alert is created which they can resolve (maybe it is "used in tests").

- If the _same_ secret is pushed again, we won't block that push. We also won't create a new alert; however, a new location may be recorded within the resolved alert (if you click into it).

If you're seeing push _not_ get blocked, what's most likely is that we just don't support that specific token as part of push protection (we have some much-needed improvements to do to the docs to make this more clear). Since push protection sits in front of the developer, we try not to annoy them with high-false positive tokens. There are a few other possibilities though, so hard to say.

heelix · 3 years ago

I suspect I'm your biggest GHAS customer :) Rolled out GHAS to over 125k repos this year. Have Tayler (z...) connect you with my details, if you wanted to chat.

Deleted Comment

daguava · 3 years ago

Don't let the perfect be the enemy of the good - this will start out in a limited detection of course, but can easily be improved with other hashes and scanning over time.

andrewflnr · 3 years ago

What's the workflow where people accidentally commit secrets to their git repos? I'm not sure I've ever done it; do we count the "base_secret" type of things web frameworks put in their default app templates? Certainly the more common mistake I make is forgetting to add new files, so it's mildly amusing that other people apparently have the opposite problem.

rieTohgh6 · 3 years ago

People keep adding whole tmp/ directories or output binaries to repositories, accidents like this stuff just happening. It is not a workflow, but for a scenario: people trying to run some test, on real service, to debug some weird issue, will temporary put credentials and forget to remove them before comiting the fix. Sure, someone probably will notice it in code review but it is too late if repo was public.

dinvlad · 3 years ago

Lots of ways this happens either accidentally or intentionally. I think most common accident is due to forgetting to add a file to .gitignore and then using git add . . Intentionally, folks just embed secrets into code out of convenience while developing, and either never even think twice, or forget to remove them before commit & push (which becomes kinda an accident)

decodebytes · 3 years ago

mostly accidental. you're working on a prototype, so to just get started you use a const at the top of your code with an API key, this then gets checked in and you then realise 'oh shit' , but by this point its within gits tree. It can still be removed, but its not a straightforward process.

idoh · 3 years ago

If you are in the CircleCI CI/CD space then adding it to a config to power some workflow step.

eranation · 3 years ago

Mitigation is premium. Detection should be free.

https://www.arnica.io/blog/secret-detection-needs-to-be-free...