Good on them. GitHub secrets cause a lot of problems. They will always create a better idiot but this idiot trap is long past due.
I also can’t wait until people base64 their creds to get past this. Explaining to some that base64 isn’t encryption tends to be hard so I imagine people will feel safe just base64 and checking it in.
base64 is far too much work. A new dev turned '"AKIAIOSFODNN7EXAMPLE"' into '"AK"
+ "IAIOSFODNN7EXAMPLE"' to make the security alert go away.
Thankfully, the alert was sent to enough people it was caught by someone else, and the key was destroyed before someone outside could have fun with it.
Given the fact that nobody really does that, I think it was a creative and low risk hack.
1. If you are worried about the people who have access to your codebase abusing a secret, you have a serious people problem that needs to be solved immediately and unambiguously. A motivated internal attacker can do almost anything. Organizations live or die on trust. One doesn’t need to scour for keys break in when they have a badge (or their mate’s) and they built the lock.
2. If you are concerned the secret will be discovered by a generic threat, it won’t, not with this string concatenation. It’s so rare. Should this become a common practice this would be over, retroactively even. We all saw this unfold with with m y e m a I l at y dot com obscurity, until the fine folks who worked on ScrapeBox turned up the right regex to start scraping those too.
3. Nothing else. You are right and responded right. Don’t put keys in code kids. Nobody likes having to erase history in their codebase, especially because of a careless mistake or a deliberate workaround.
I’m just saying… of all the dumb shortcuts that can break stuff, this one is one the mostly harmless end of the spectrum.
I legitimately recently had to argue with a PM and his developers, that a base64 encoded user ID isn’t considered security best practices for API authentication. Even when I showed them how I can produce the “secret” myself, they kept arguing that I was wrong.
Ok I've been around for a long time and I don't think I would have met anyone who would have argued that since around 2009, although I guess I can remember secrets and keys going into repos as late as 2018 at places I've seen.
With md5 hashes, the actual password isn’t there whereas base64 encoding is merely another way of representing the same bits. Yes md5 is weak, but it’s Fort Knox compared to base64.
Does secret scanning also apply to public GitHub Action logs and Issues (or more generally, Checks logs)?
We found Action logs to be a much bigger threat now that many folks have learned not to embed secrets directly into the code and to use secret managers instead. But even then, the secrets retrieved in a step can be printed in plaintext if someone, for example, runs that step debug mode.
Issues can also accidentally leak secrets via for example, third-party code builders that print their output in an issue.
GitHub PM here. Right now we scan code, commit metadata, issues, and issue comments. We're expanding to other content types over time, with support for pull request bodies and comments coming in early 2023. Actions logs are on our list too, but will take a little longer.
(It's worth noting that any secrets in your Actions secret store will already be redacted in any Actions logs, so those won't leak there.)
Searching for creds can be tricky if they can't be readily distinguished from other text.
Can anyone think of a problem with generating customer API keys that have a known prefix that makes them more detectable?
For example, a key like "FooSecret.ZTNiMGM0NDI5OGZjMWMxNDlhZmJmNGM4OTk2ZmI5". I wouldn't think that'd open up any new attacks, but I'm no expert on the matter.
GitHub PM here. We switched our own token format to something similar to the above in April of last year and have been encouraging other service providers to do the same.
The big benefit of highly identifiable tokens is not just that we can alert on them, but that we can scan for them at pre-receive time and prevent them from leaking (by rejecting the push). We already have that functionality as part of GitHub Advanced Security, and are planning to make it available (for free) on public repos in 2023.
One of the big issues with secret scanning now is that it’s opt-in from platforms, and from the list of supported platforms it seems like small ones may not be able to be included.
The holy grail here would be to introduce a standardized token format that encodes a disclosure endpoint. Then platforms can issue tokens to this standard and receive notifications without needing to explicitly opt in.
I argued for something like that previously on HN, like adding a domain prefix 'myservice.com_secretkeyhere'. This would allow automatic discovery of the reporting/revocation endpoint from the key. Then someone pointed out that you could just use an actual URL as your secret key and have that be the URL you visit to revoke it, and I think that is genius.
Next service I make that has API keys, I will make them look like `https://secret.myservice.org/ZTNiMGM0NDI5OGZjMWM`. POSTing to that URL revokes the key, a GET shows a form explaining what it is and a button to revoke the key.
One issue is that some email services mangle URLs specifically, and that would be bad for keys.
Another approach is to identify likely ones based on the entropy of the strings. I used a tool that did precisely this once and found some, but can't find it anymore.
Yep - this way my thought. I've been bitten by this is in the past: an endpoint was throwing an error and dumping out environment variables (which included API keys) - the prefixed API keys were found by crawlers and abused but the unprefixed keys were untouched (though they were cycled just in case).
We use this at our company. Wildly successful at finding tokens for most of the usual suspects. If they are including secret blocking - it will prevent someone from doing the dumb as well.
One question/behavior - if the secret scanner found something and folks resolved it -> secret blocking is enabled -> and a developer does the dumb again, should it block the PR with the new secret? Wondering if we might have something misconfigured as I have seen new secrets get added after we enabled blocking.
Hello! I am an engineer on the Secret Scanning team, thanks for the kind words!
- "push protection" (as we call it) isn't available for free, and isn't part of this rollout.
- For folks who do pay, the flow may be: a developer tries to push, they bypass the secret, are now able to push. From there, an alert is created which they can resolve (maybe it is "used in tests").
- If the _same_ secret is pushed again, we won't block that push. We also won't create a new alert; however, a new location may be recorded within the resolved alert (if you click into it).
If you're seeing push _not_ get blocked, what's most likely is that we just don't support that specific token as part of push protection (we have some much-needed improvements to do to the docs to make this more clear). Since push protection sits in front of the developer, we try not to annoy them with high-false positive tokens. There are a few other possibilities though, so hard to say.
I suspect I'm your biggest GHAS customer :) Rolled out GHAS to over 125k repos this year. Have Tayler (z...) connect you with my details, if you wanted to chat.
Don't let the perfect be the enemy of the good - this will start out in a limited detection of course, but can easily be improved with other hashes and scanning over time.
What's the workflow where people accidentally commit secrets to their git repos? I'm not sure I've ever done it; do we count the "base_secret" type of things web frameworks put in their default app templates? Certainly the more common mistake I make is forgetting to add new files, so it's mildly amusing that other people apparently have the opposite problem.
People keep adding whole tmp/ directories or output binaries to repositories, accidents like this stuff just happening. It is not a workflow, but for a scenario: people trying to run some test, on real service, to debug some weird issue, will temporary put credentials and forget to remove them before comiting the fix. Sure, someone probably will notice it in code review but it is too late if repo was public.
Lots of ways this happens either accidentally or intentionally. I think most common accident is due to forgetting to add a file to .gitignore and then using git add . . Intentionally, folks just embed secrets into code out of convenience while developing, and either never even think twice, or forget to remove them before commit & push (which becomes kinda an accident)
mostly accidental. you're working on a prototype, so to just get started you use a const at the top of your code with an API key, this then gets checked in and you then realise 'oh shit' , but by this point its within gits tree. It can still be removed, but its not a straightforward process.
I also can’t wait until people base64 their creds to get past this. Explaining to some that base64 isn’t encryption tends to be hard so I imagine people will feel safe just base64 and checking it in.
Thankfully, the alert was sent to enough people it was caught by someone else, and the key was destroyed before someone outside could have fun with it.
1. If you are worried about the people who have access to your codebase abusing a secret, you have a serious people problem that needs to be solved immediately and unambiguously. A motivated internal attacker can do almost anything. Organizations live or die on trust. One doesn’t need to scour for keys break in when they have a badge (or their mate’s) and they built the lock.
2. If you are concerned the secret will be discovered by a generic threat, it won’t, not with this string concatenation. It’s so rare. Should this become a common practice this would be over, retroactively even. We all saw this unfold with with m y e m a I l at y dot com obscurity, until the fine folks who worked on ScrapeBox turned up the right regex to start scraping those too.
3. Nothing else. You are right and responded right. Don’t put keys in code kids. Nobody likes having to erase history in their codebase, especially because of a careless mistake or a deliberate workaround.
I’m just saying… of all the dumb shortcuts that can break stuff, this one is one the mostly harmless end of the spectrum.
;)
We found Action logs to be a much bigger threat now that many folks have learned not to embed secrets directly into the code and to use secret managers instead. But even then, the secrets retrieved in a step can be printed in plaintext if someone, for example, runs that step debug mode.
Issues can also accidentally leak secrets via for example, third-party code builders that print their output in an issue.
(It's worth noting that any secrets in your Actions secret store will already be redacted in any Actions logs, so those won't leak there.)
Can anyone think of a problem with generating customer API keys that have a known prefix that makes them more detectable?
For example, a key like "FooSecret.ZTNiMGM0NDI5OGZjMWMxNDlhZmJmNGM4OTk2ZmI5". I wouldn't think that'd open up any new attacks, but I'm no expert on the matter.
The big benefit of highly identifiable tokens is not just that we can alert on them, but that we can scan for them at pre-receive time and prevent them from leaking (by rejecting the push). We already have that functionality as part of GitHub Advanced Security, and are planning to make it available (for free) on public repos in 2023.
[1] https://github.blog/2021-04-05-behind-githubs-new-authentica...
The holy grail here would be to introduce a standardized token format that encodes a disclosure endpoint. Then platforms can issue tokens to this standard and receive notifications without needing to explicitly opt in.
Next service I make that has API keys, I will make them look like `https://secret.myservice.org/ZTNiMGM0NDI5OGZjMWM`. POSTing to that URL revokes the key, a GET shows a form explaining what it is and a button to revoke the key.
One issue is that some email services mangle URLs specifically, and that would be bad for keys.
(edit: sudhirj is the genius: https://news.ycombinator.com/item?id=28299624)
https://github.blog/2021-04-05-behind-githubs-new-authentica...
They have a list of supported secrets they can find via automated scans:
https://docs.github.com/en/code-security/secret-scanning/sec...
Really easy to just grep through something looking for that prefix
One question/behavior - if the secret scanner found something and folks resolved it -> secret blocking is enabled -> and a developer does the dumb again, should it block the PR with the new secret? Wondering if we might have something misconfigured as I have seen new secrets get added after we enabled blocking.
- "push protection" (as we call it) isn't available for free, and isn't part of this rollout.
- For folks who do pay, the flow may be: a developer tries to push, they bypass the secret, are now able to push. From there, an alert is created which they can resolve (maybe it is "used in tests").
- If the _same_ secret is pushed again, we won't block that push. We also won't create a new alert; however, a new location may be recorded within the resolved alert (if you click into it).
If you're seeing push _not_ get blocked, what's most likely is that we just don't support that specific token as part of push protection (we have some much-needed improvements to do to the docs to make this more clear). Since push protection sits in front of the developer, we try not to annoy them with high-false positive tokens. There are a few other possibilities though, so hard to say.
Deleted Comment
https://www.arnica.io/blog/secret-detection-needs-to-be-free...