Open Source authors [1] [2] (including myself) have complained of automatic security scans. They yield way too many false positives, increasing the burden of maintaining repositories. Specially troublesome are when e.g. the "vulnerability" (if it's even one) is in a devDependency that is not deployed to production.
In theory automatic vulnerability scans sounds great, but having every repo ping you with not-actually-an-issue becomes a chore very quickly. So far the vast majority of vulnerabilities I've seen are actually noise/not applicable. If this code checker is actually good, unlike all of the previous ones, that's another thing and might actually be a game changer.
Prominent open source authors have often suggested ways that GIthub can help but seem to be ignored, e.g. allowing to add friction to opening random issues would benefit open source greatly. At some point many beginner devs migrated from StackOverflow to Github because their really bad question were being closed there, and now they just overwhelm open source authors.
Microsoft has used products from Semmle (the now-GitHub and thus now-Microsoft division whose tech is in GitHub code scanning) for a few years, and I've personally used it on occasion.
From that limited experience, I'd say that false positives are less of a problem with Semmle's checkers than with other security-focused static analysis tools. This is partly due to Semmle checkers being much more customizable; Semmle has developed a declarative query language called CodeQL* which its checkers' built-in and user-provided rules are written in. Microsoft's security development lifecycle has a lot of mandates which are captured by custom CodeQL rules precisely enough to match their intent.
There's a genuine security fatigue issue (much like event fatigue) that comes from false positives. Unfortunately that doesn't reduce the value of the scanning - the onus is on the false positives.
At the very least, running and pruning scans should happen on projects so that at least we can have the conversation. It's like PCI (as an example, not an ideal); PCI isn't perfect, but at least it encourages a conversation about security. Today we're at the point where almost every single organization at least discusses security; but I remember when PCI first came out. I can't tell you have many times people used to ask why it was problematic to store passwords in plain text.
It this the best step forward, probably not. Is it a step forward, absolutely.
Sure, but the reported issues are usually not nearly that severe.
Most of the time what I see is
"A dependency of a dependency of a dependency of Webpack is vulnerable to a Regular expression Denial of Service attack" or prototype pollution or something like that.
Github's notification system is incredibly spammy if you have a lot of repos, and there's no obvious way to manage it. There's also a huge need for an "unsubscribe all" in the notification inbox.
There's also not a severity indicator - some minor issue not encountered in normal use is just as noisy as an extremely important issue that affects every user.
Other tools like Jira and Gerrit are far better at this.
The GitHub notifications story is really quite poor for developers, it's extremely difficult to get an alert routed only to the person who triggers it.
That's actually a very interesting point in regards to switching from StackOverflow to Github - I have noticed the trend that what I could normally find on Stackoverflow, I now often find on Github issues
Seeing many people surprised or like "finally someone said it", I thought this was very common knowledge? I might have been in some specific communities at the critical point 2-3 years ago, where bootcamps and other educators would advice new devs not to go to StackOverflow but instead go to Github. It didn't occur to me back then that the consequences would be so bad.
At some point many beginner devs migrated from StackOverflow to Github because their really bad question were being closed there, and now they just overwhelm open source authors
I'm glad it's not just me seeing this - My repos aren't even that popular and some of the issues just seem to be "help me build my project..."
I don't mind help in finding these problems, what I LOATHE though is when people trust the tool more than me and (for example) prevent me from pushing something that disagrees with the tool. So somewhere I imagine some manager will force their devs to make this tool happy and that's wrong.
It is tricky. I would say the perfect tool still needs to be developed.
And I agree with you. Tools need to get better in detecting dependencies. However, it can be helpful to know at least that you rely on vulnerable code even if it does not get deployed to prod.
To be automated, security scans are like unit tests. Just because a unit test is green does not mean your app is working and vice versa.
So security scans are more like test levels: unit tests, integration tests, and end-to-end tests. Different scans, different results, and it takes us, humans, still to put it in perspective.
> Open Source authors (including myself) have complained of automatic security scans. They yield way too many false positives, increasing the burden of maintaining repositories.
Unfortunately all tools have either false positives or false negatives, and in practice often both. Tools can (and should) take steps to minimize them or their impact. Nothing makes you use any specific tools; if you don't like what a tool does, don't use it.
> Specially troublesome are when e.g. the "vulnerability" (if it's even one) is in a devDependency that is not deployed to production.
This particular GitHub tool is for analyzing source code, and would not not normally analyze your dependencies. So that doesn't seem relevant in this case.
Of course, someone could mindlessly use this tool (or look at its results) and complain. That's easy, just ask for funding to fix the problem, or at least a pull request that fixes it.
We have this problem with our non- OS project but found that after the initial review (of which most were false-positives) we could permanently suppress them with a comment and fail the build for net-new work without a huge impact on developers.
> A lede is the introductory section in journalism and thus to bury the lede refers to hiding the most important and relevant pieces of a story within other distracting information. The spelling of lede is allegedly so as to not confuse it with lead (/led/) which referred to the strip of metal that would separate lines of type. Both spellings, however, can be found in instances of the phrase.
As far as I can tell “lede” is entirely a relatively unpopular neologism being unnecessarily pushed, and the phrase spelled “bury the lead” outdates it by almost 100 years?
Lede is just lead spelled incorrectly with the same meaning.
Why? How does vulnerability detection for PHP hurt you?
They stated in the comments here that they are actively working on it, and I would personally appreciate it in addition to the number of other static analysis tools I use.
actually, i'm very surprised that PHP isn't in the list seeing how WordPress account for 75%+ of the sites internet and the slew of plugins in its ecosystem.
Recently I learned in a conversation [0] (about SaaS in general, not GitHub in particular) that you passing is actually the desired outcome and it's by design.
So, I guess, "well done"? (it hurts a little though, I'm too in the camp of wanting to see the pricing beforehand)
I think the real question is if their pricing design is optimal. Would they make more money with clear pricing? I think so.
One of my ancestors had a company selling commodities. He wouldn’t answer the phone until the customer had called three times and left messages. He said this was a filter to identify the customers who really needed his product.
The logic is sound and on the surface clever. But would he have made more money servicing all customers? Or perhaps marketing?
Well considering how much money we recently spent deploying CheckMarx and integrating it into all our pipelines (hundreds of man hours of engineers on 6 figure salaries + a 6 figure licensing fee per year) that is quite expected.
If you're in an organisation that's are already using GitHub and this scanning capability is as good as that of CheckMarx, Snyk, etc. then it would be a no brainer to upgrade your Enterprise GitHub plan (if you're not already on Enterprise).
We need a crowdsource information for this problem. Imagine a volunteer ask them the price and share it to the world. Glassdoor but for "sales price". Package it as a browser extension. Everyone becomes happy.
What’s frustrating is that it’s not part of the $21/month. I have that and have been trying to get pricing info for a few weeks. I’ve gotten mixed messages that it costs nothing extra and just uses Action minutes on their price schedule or that it costs some unknown price that is extra.
My impression is that they haven’t picked pricing yet.
It frustrates me when the price answer is “contact sales and let’s talk about it.”
These marginal services really depend on the price, I think.
They are still trying to determine pricing, and figure out how to position themselves against competitors (who are about to have thier lunch eaten if gh is smart)
Hmm. It says "included with gihub enterprise" on the product page, then on the pricing page it says $21/user/mo. Github has done a good job on making something complicated simple... but this appears to be the opposite... making the simple complicated.
>In the not-so-distant future: Code snippet scanning for copyright infringement
Will never happen. Can you imagine what would occur if Github started harrassing private repo owners for including GPL licensed code? Or automatically making them public? All their customers would bolt immediately. Copyright infringement is Github's bread and butter.
Stack Overflow attribution is equally unlikely. The same group that says Oracle's claim that Java's API is not original and unworthy of copyright protection, cannot then turn around and claim 30 lines of code from SO is original and deserves recognition.
Stack Overflow attribution is equally unlikely. The same group that says Oracle's claim that Java's API is not original and unworthy of copyright protection, cannot then turn around and claim 30 lines of code from SO is original and deserves recognition.
I would bet most stack overflow answers don't qualify as copyrightable, at least in the US. Though I think automatically finding copied stuff would be very useful. Any time I copy something small I try to include a link to where I got it from, if someone has to troubleshoot my code it may help them to see where it came from.
> Can you imagine what would occur if Github started harrassing private repo owners for including GPL licensed code?
Why would Github do that? It's perfectly fine to not distribute code you received under the GPL. The license just says that if you do distribute the code in binary, you must provide recipients with the source as well.
Many companies pay good money for scanning services like that, so Github adding it to their Enterprise offerings would make sense. (No clue how you get to "harassing" and "automatically making them public" from that suggestion...)
It is. The authors own the content but must license it to SO under CC-BY-SA (the version having changed over time: https://stackoverflow.com/help/licensing ) and SO then publishes it to readers under that same license
In theory automatic vulnerability scans sounds great, but having every repo ping you with not-actually-an-issue becomes a chore very quickly. So far the vast majority of vulnerabilities I've seen are actually noise/not applicable. If this code checker is actually good, unlike all of the previous ones, that's another thing and might actually be a game changer.
Prominent open source authors have often suggested ways that GIthub can help but seem to be ignored, e.g. allowing to add friction to opening random issues would benefit open source greatly. At some point many beginner devs migrated from StackOverflow to Github because their really bad question were being closed there, and now they just overwhelm open source authors.
[1] https://twitter.com/sindresorhus/status/1123986529498664961
[2] https://twitter.com/FPresencia/status/1311551520689713152
From that limited experience, I'd say that false positives are less of a problem with Semmle's checkers than with other security-focused static analysis tools. This is partly due to Semmle checkers being much more customizable; Semmle has developed a declarative query language called CodeQL* which its checkers' built-in and user-provided rules are written in. Microsoft's security development lifecycle has a lot of mandates which are captured by custom CodeQL rules precisely enough to match their intent.
You can see some examples of how Microsoft uses Semmle here: https://msrc-blog.microsoft.com/2018/08/16/vulnerability-hun...
* https://github.com/github/codeql
At the very least, running and pruning scans should happen on projects so that at least we can have the conversation. It's like PCI (as an example, not an ideal); PCI isn't perfect, but at least it encourages a conversation about security. Today we're at the point where almost every single organization at least discusses security; but I remember when PCI first came out. I can't tell you have many times people used to ask why it was problematic to store passwords in plain text.
It this the best step forward, probably not. Is it a step forward, absolutely.
https://news.ycombinator.com/item?id=17513709
Most of the time what I see is
"A dependency of a dependency of a dependency of Webpack is vulnerable to a Regular expression Denial of Service attack" or prototype pollution or something like that.
Deleted Comment
There's also not a severity indicator - some minor issue not encountered in normal use is just as noisy as an extremely important issue that affects every user.
Other tools like Jira and Gerrit are far better at this.
I'm glad it's not just me seeing this - My repos aren't even that popular and some of the issues just seem to be "help me build my project..."
Deleted Comment
Unfortunately all tools have either false positives or false negatives, and in practice often both. Tools can (and should) take steps to minimize them or their impact. Nothing makes you use any specific tools; if you don't like what a tool does, don't use it.
> Specially troublesome are when e.g. the "vulnerability" (if it's even one) is in a devDependency that is not deployed to production.
This particular GitHub tool is for analyzing source code, and would not not normally analyze your dependencies. So that doesn't seem relevant in this case.
Of course, someone could mindlessly use this tool (or look at its results) and complain. That's easy, just ask for funding to fix the problem, or at least a pull request that fixes it.
> A lede is the introductory section in journalism and thus to bury the lede refers to hiding the most important and relevant pieces of a story within other distracting information. The spelling of lede is allegedly so as to not confuse it with lead (/led/) which referred to the strip of metal that would separate lines of type. Both spellings, however, can be found in instances of the phrase.
[1] https://www.merriam-webster.com/words-at-play/bury-the-lede-...
Lede is just lead spelled incorrectly with the same meaning.
https://books.google.com/ngrams/graph?year_end=2019&year_sta...
Deleted Comment
They stated in the comments here that they are actively working on it, and I would personally appreciate it in addition to the number of other static analysis tools I use.
> Contact Sales to learn more
So, I guess, "well done"? (it hurts a little though, I'm too in the camp of wanting to see the pricing beforehand)
[0]: https://news.ycombinator.com/item?id=24630106
One of my ancestors had a company selling commodities. He wouldn’t answer the phone until the customer had called three times and left messages. He said this was a filter to identify the customers who really needed his product.
The logic is sound and on the surface clever. But would he have made more money servicing all customers? Or perhaps marketing?
Looks like they expect you to upgrade to their Enterprise plan. That’s a 5x increase in subscription costs.
If you're in an organisation that's are already using GitHub and this scanning capability is as good as that of CheckMarx, Snyk, etc. then it would be a no brainer to upgrade your Enterprise GitHub plan (if you're not already on Enterprise).
My impression is that they haven’t picked pricing yet.
It frustrates me when the price answer is “contact sales and let’s talk about it.”
These marginal services really depend on the price, I think.
Will never happen. Can you imagine what would occur if Github started harrassing private repo owners for including GPL licensed code? Or automatically making them public? All their customers would bolt immediately. Copyright infringement is Github's bread and butter.
Stack Overflow attribution is equally unlikely. The same group that says Oracle's claim that Java's API is not original and unworthy of copyright protection, cannot then turn around and claim 30 lines of code from SO is original and deserves recognition.
I would bet most stack overflow answers don't qualify as copyrightable, at least in the US. Though I think automatically finding copied stuff would be very useful. Any time I copy something small I try to include a link to where I got it from, if someone has to troubleshoot my code it may help them to see where it came from.
Why would Github do that? It's perfectly fine to not distribute code you received under the GPL. The license just says that if you do distribute the code in binary, you must provide recipients with the source as well.
Where "oracle" means predictor and not the database/java folks.
Here's my mvp:
Investors please :)https://securitylab.github.com/bounties
In the past, I've done a couple one-off scans at https://lgtm.com/
Deleted Comment