There's an interesting point hiding in the article about security being an emergent property of the whole software system.
Many libraries are flagged with CVEs because they can be used as part of the trust boundary of the whole software system, and they have corner cases that allow certain malicious inputs to give outputs that may be surprising and unexpected to the clients of the library. The library developers push back and say "Can you point to one real-world vulnerability where the library is actually used in the way that the CVE says constitutes a vulnerability?", effectively pushing the responsibility back onto the clients of the library.
But that's exactly how real malware usually works. Individual components that are correct in isolation get combined in ways where the developer thinks that one component offers a guarantee that it doesn't really, and so some combination of inputs does something unexpected. An enterprising hacker exploits that to access the unexpected behavior.
There isn't really a good solution here, but it seems like understanding this tradeoff would point research toward topics like proof-carrying code, fuzzing, trust boundaries, capabilities, simplifying your system, and other whole-system approaches to security rather than nitpicking individual libraries for potential security vulnerabilities.
> effectively pushing the responsibility back onto the clients of the library.
That does not compute. The responsibility is pushed back onto the security researcher to prove that there is an actual vulnerability in practice.
Libraries (ideally) have documentation and contracts on what they expect on their input. If clients don't abide to the contract then it's on them. If the library has a wide contract then it's on the library. However even if the library has a bug, the bug needs to be exercised by clients for it to be an actually impactful vulnerability.
> But that's exactly how real malware usually works. Individual components that are correct in isolation get combined in ways where the developer thinks that one component offers a guarantee that it doesn't really, and so some combination of inputs does something unexpected. An enterprising hacker exploits that to access the unexpected behavior.
That means there's a security vulnerability in the app and a bug in the library. Not a security vulnerability in the library.
I would mostly agree, with the exception of libraries that are responsible for parsing untrusted input and making it safe. You can't, on the one hand, tell developers, "always validate your input, if not then we have a security vulnerability, and don't write your own validation libraries because they're super risky to get right", then pretend that the security hole caused by a bug in the library should have been handled properly in some fashion by developers.
And that is indeed the case here. The node-ip library had a security vulnerability in the function that informed developers if an IP was private or not. That's important when you accept user-defined webhooks and you want to ensure that an attacker isn't giving you something that resolves to a private IP within your own network. The whole process of "validating untrusted user input" ends up relying, at least in part, on libraries like this which tell you whether the components of the input are safe.
>The library developers push back and say "Can you point to one real-world vulnerability where the library is actually used in the way that the CVE says constitutes a vulnerability?"
From my understanding of the article, the developer suggested people provide not a "real-world vulnerability" but an example of one - a small project that exposes said vulnerability and steps one has to take to abuse it. And what he got irritated about was lack of such examples.
More so since he had been effectively email-DDoSed and had to chase some entity to mark the vulnerability as resolved, which probably took orders of magnitude more time, energy and soul from him, than actually fixing the bug.
But the _actual_ problem is the thanklessness (preferably material) of the work put into such open source projects, developer burnout and what not. Guy probably made like 1000$ total off of those millions of downloads per week. Understandably, he doesn't want his time being seemingly wasted discussing and fixing such seemingly unimportant issues.
Making open source materially rewarding and a more or less legitimate occupation is the real issue.
Granted, it's basically if(function_from_lib(user_input)) make_http_request(user_input) , which imo seems like a bit of a forced example, but it is an example
>effectively pushing the responsibility back onto the clients of the library.
well, yes. that's how this should work. if you want to use a library, it's up to you to decide whether or not it is a good fit for your project, including whether it has security issues that you need to worry about. a tool like NPM audit can help you make this decision, but there is no scenario where it makes a library developer responsible for fixing a security issue in your own project (unless you're paying them for that service).
Every library makes assumptions about the usage context. Stating those assumptions up front should be sufficient.
For example, Go has two differing template libraries: one for general text and one for HTML. They have very different assumptions about how you use them and what they protect against and those are stated in the documentation.
> Individual components that are correct in isolation
this point is wrong, and everything else following is consequently wrong as well.
This is an exemplary case where an individual component was correct under a very limited set conformance/correctness test. It is however wrong under the full set of conformance/correctness test.
So essentially, the component is incorrect in general and happens to be correct under the specific and most common cases.
If we keep this idea in mind. Couldn't we argue that any piece of software that communicates over internet might allow any type of exploiting, including arbitrary code execution at Ring 0. As technically simple open socket could be made to input data directly to be executed by kernel process?
Indeed, and if that software can't point to some contract or license that suggests someone else is liable, then they are responsible.
Funnily enough, almost all open source projects not only disclaim such liability in very large print at the top of their licenses, but they don't even complicate the relationship with some kind of purchase agreement and renumeration.
They literally just say "Here's a thing. Use it if you want, change it if you want, but you're responsible for making sure it meets your requirements."
Ugh - say I wrote a daemon that runs every 2 hours, it exposes no end points and has no metrics. But just because I depend on some library that brings in promethus which in turn brings some http2 library, I am on the hook for fixing this Cve in my code.
Shouldn't it be on security researcher to prove that how this can be exploited if no http end points are created?
> Shouldn't it be on security researcher to prove that how this can be exploited if no http end points are created?
The problem is, from their viewpoint the security researcher is completely correct: a vulnerability is a vulnerability.
Consuming applications absolutely have to do their own research for CVEs in dependencies, to determine if they are impacted or not, and to develop mitigations on their side as well if needed.
Yeah, but that's exactly the problem. While the maintainer waits for that one real-world use case, others are hoarding the vuln or using it stealthily enough to not raise alarms.
My predisposition is that the people hounding open-source devs for unpaid urgent tech support are predominantly commercial users freeloading off the software, which, amusingly, means that a compelling argument in favor of licensing your software as GPL is to scare away these sorts of users.
> For example, IBM, the owner of Red Hat, forbids the use of GPL3 software on their equipment by their employees.
This I can understand. But the fact that they joined the witch-hunt for Stallman after the slandering piece by Selam Gano, who knowingly presented untrue statements as facts, and these were later picked up by mainstream media (some of these lies are still out there uncorrected[0]), that is still beyond me.
My experience is that it’s not higher end commercial users who would actually harass the dev, but lower scale commercial users who don’t care and the license wouldn’t scare them off.
Any large company that would care about the license, would also have terms on their employees to not directly interact with projects and would likely internally patch it if needed instead.
Your first sentence is a little confusing, but I can confirm that Google, at least, uses lots of open source software, and the build system automatically checks that all the third-party libraries used in an executable have a suitable license. [1] They also vendor everything and will patch security issues locally.
It’s been a long time since I worked there, but I don’t know any reason that they wouldn’t send a patch upstream as well.
I’m generally in favor of not trying very hard to get more users for open source software. More users means more problems. What’s in it for you?
The trouble is that many software developers have jobs. It sucks to work on open source software that would be useful, except that you can’t use it at work due to the license. (Possibly at a future job.)
If you fully control the software then you can dual-license, but it’s a more difficult negotiation than “it’s yet another npm with a standard license.”
> except that you can’t use it at work due to the license
Just to clarify:
If you hold the copyright over the code, you can change the license to whatever you want, whenever you want.
In other words, if you are the lone dev of a library that has been GPL for 20 years, and then you get hired, and you want to use that library at your job, you can just make a copy of it and say "this copy is yours to do with as you please, have fun". You don't even need to mess around with dual-licensing. Licenses affect only people who aren't the copyright holder.
CISO and people in his office (the so-called cyber security experts) are nothing but report pushers. They run vulnerability scans on code, and whatever comes back from packages like Tenable, they send to everyone to justify their own existence. They don't consider the severity, they don't consider snd differentiate between attack surfaces and attack vectors. They just hound you and your superiors in the name of insurance liabilities... they suck. They turn developers into hounds that harass other developers for fixes. Out goes the desire to work on a software because all you're doing is patching nonsense every day because some ciso somewhere is unsatisfied. To hell with each and every ciso. Security is important and having cyber folks that have programming background is even more important. Mindless lemmings otherwise.
Most of this behaviour comes from the desire by many companies to by compliant with a lot of security regulations. Which in many cases means silly rule like “you must run and action a security scanning system”. Because a lot of these scanners are just dumb wrappers running any piece of software that pretends to be a security scanner, and because the more rules does one have the more “valuable” it is, you end up with a race to scan the most. And that sadly translates into rules and reports like https://hackerone.com/reports/191220 - OPTIONS method can be used to check what methods does a web server accepts, therefore an attacker might use it to learn which methods to use for the attack. Except they can just try it with no effort. It’s this sort of “if you can see a lock then attacker will use that knowledge to know where the lock is” logic that must be followed by “let’s remove all locks so they cannot be attacked” response.
Yeah, they're risk analysts, not technologists. That's not inherently bad, you need those. In a previous life I worked in a domain with a lot of risk analysis, by the end I liked a lot of them and they were usually fairly easy to have as stakeholders. But security has a track record of failing to equip the rest of a business with process adequate for the inherent volatility in risk they're supposed to be managing. In the engine room it's still dashing from fire to fire, just with better fire alarms. The things you complain about are precisely the business problems that the security group should be solving. Cybersecurity is important enough that they can get away with overbearing demands without providing holistic solutions for the organization to reach them.
That's a very narrow view, to the point of being flat-out wrong. I was a CISO. Before that, I was a staff platform engineer who wrote the software other people would be evaluating.
I never, not once, pushed an upstream dev to fix a thing. I provided plenty of PRs over the years. If they didn't get merged, we maintained our own locally patched version.
My job was to find a way for us all to do as little as possible to meet our security goals. Those goals were lofty and sometimes that turned out to require quite a bit of work. But we never, ever, made our problem someone else's problem.
You see the CISOs that are a pain in the ass. You don't see the ones quietly going about their business trying to make the world a little safer.
Seems impossible to read and verify. Wouldn’t it be simpler and more consistent to have a first parse the IP into an internal format then perform all logic on that?
I concur. Any time I see tangles of regexes and if statements, it's going to be full of holes. I spent several months replacing messes like this with proper parsers and you'd be surprised at how many fuck ups fell out of the mess.
> Wouldn’t it be simpler and more consistent to have a first parse the IP into an internal format then perform all logic on that?
So not speaking about this library specifically but yes, separating the address parsing part (https://www.netmeister.org/blog/inet_aton.html is a fun read) from the "is this address considered private?" part should make things much simpler.
Vast majority of related code in this space that I've seen works like that. I'm actually struggling to come up with counterexamples.
Parts of the code actually parses the IP strings into a sensible type (toBuffer). Why not use this representation for _all_ the queries and operations instead of mixing in regexes?
Regexes can be fine in order to parse the string representations initially. But for everything else I would stick with byte arrays etc.
It may be, but perhaps there is some performance concern? I’m not familiar with this domain and application so it’s not clear to me whether the decisions made are sensible just from reading the code.
> It may be, but perhaps there is some performance concern?
I'm skeptical of that concern: executing multiple regular expression matches on a string is never going to be faster than parsing the entire string once, into a 4-octet structure, and then performing integer comparisons on the octets.
I feel like the main reason someone uses a library like this is that they don't want to write or maintain such a regex. Don't tell me how the sausage is made!
"Impossible" to read? No. I didn't find it difficult to read at all. There are even comments that tell what the code is supposed to do, and tests that show how to use the functions. Are regular expressions "impossible" to read? Well maybe if someone doesn't understand regular expressions. There are a hundred million other ways to write these functions, sure, but if we wrote code to appease the personal tastes of every programmer in the world, we'd never get anything done.
Most of the CVEs I have to deal on my $DAY_JOB are regular bugs, not security issues. Nowadays CVEs are like bitcoin mining but for the security folks.
> the verification process of vulnerability reports doesn't involve maintainer at all, and it sounds like the commercial interest of advisory repositories is aligned with creating more vulnerabilities and proving themselves “useful" to companies that utilize them.
It seems absolutely insane that this would get anywhere near a 9.8 severity considering that the windows RCE over wifi scored 9.8 as well. Who makes these scores?
Many libraries are flagged with CVEs because they can be used as part of the trust boundary of the whole software system, and they have corner cases that allow certain malicious inputs to give outputs that may be surprising and unexpected to the clients of the library. The library developers push back and say "Can you point to one real-world vulnerability where the library is actually used in the way that the CVE says constitutes a vulnerability?", effectively pushing the responsibility back onto the clients of the library.
But that's exactly how real malware usually works. Individual components that are correct in isolation get combined in ways where the developer thinks that one component offers a guarantee that it doesn't really, and so some combination of inputs does something unexpected. An enterprising hacker exploits that to access the unexpected behavior.
There isn't really a good solution here, but it seems like understanding this tradeoff would point research toward topics like proof-carrying code, fuzzing, trust boundaries, capabilities, simplifying your system, and other whole-system approaches to security rather than nitpicking individual libraries for potential security vulnerabilities.
That does not compute. The responsibility is pushed back onto the security researcher to prove that there is an actual vulnerability in practice.
Libraries (ideally) have documentation and contracts on what they expect on their input. If clients don't abide to the contract then it's on them. If the library has a wide contract then it's on the library. However even if the library has a bug, the bug needs to be exercised by clients for it to be an actually impactful vulnerability.
That means there's a security vulnerability in the app and a bug in the library. Not a security vulnerability in the library.
IMO it's a mistake to assign CVEs to libraries
And that is indeed the case here. The node-ip library had a security vulnerability in the function that informed developers if an IP was private or not. That's important when you accept user-defined webhooks and you want to ensure that an attacker isn't giving you something that resolves to a private IP within your own network. The whole process of "validating untrusted user input" ends up relying, at least in part, on libraries like this which tell you whether the components of the input are safe.
From my understanding of the article, the developer suggested people provide not a "real-world vulnerability" but an example of one - a small project that exposes said vulnerability and steps one has to take to abuse it. And what he got irritated about was lack of such examples.
More so since he had been effectively email-DDoSed and had to chase some entity to mark the vulnerability as resolved, which probably took orders of magnitude more time, energy and soul from him, than actually fixing the bug.
But the _actual_ problem is the thanklessness (preferably material) of the work put into such open source projects, developer burnout and what not. Guy probably made like 1000$ total off of those millions of downloads per week. Understandably, he doesn't want his time being seemingly wasted discussing and fixing such seemingly unimportant issues.
Making open source materially rewarding and a more or less legitimate occupation is the real issue.
Granted, it's basically if(function_from_lib(user_input)) make_http_request(user_input) , which imo seems like a bit of a forced example, but it is an example
well, yes. that's how this should work. if you want to use a library, it's up to you to decide whether or not it is a good fit for your project, including whether it has security issues that you need to worry about. a tool like NPM audit can help you make this decision, but there is no scenario where it makes a library developer responsible for fixing a security issue in your own project (unless you're paying them for that service).
For example, Go has two differing template libraries: one for general text and one for HTML. They have very different assumptions about how you use them and what they protect against and those are stated in the documentation.
this point is wrong, and everything else following is consequently wrong as well.
This is an exemplary case where an individual component was correct under a very limited set conformance/correctness test. It is however wrong under the full set of conformance/correctness test.
So essentially, the component is incorrect in general and happens to be correct under the specific and most common cases.
Funnily enough, almost all open source projects not only disclaim such liability in very large print at the top of their licenses, but they don't even complicate the relationship with some kind of purchase agreement and renumeration.
They literally just say "Here's a thing. Use it if you want, change it if you want, but you're responsible for making sure it meets your requirements."
Shouldn't it be on security researcher to prove that how this can be exploited if no http end points are created?
So much of security scanning is such bullshit.
The problem is, from their viewpoint the security researcher is completely correct: a vulnerability is a vulnerability.
Consuming applications absolutely have to do their own research for CVEs in dependencies, to determine if they are impacted or not, and to develop mitigations on their side as well if needed.
For example, IBM, the owner of Red Hat, forbids the use of GPL3 software on their equipment by their employees.
This I can understand. But the fact that they joined the witch-hunt for Stallman after the slandering piece by Selam Gano, who knowingly presented untrue statements as facts, and these were later picked up by mainstream media (some of these lies are still out there uncorrected[0]), that is still beyond me.
[0] Google for "Stallman defends Epstein", e.g. https://techcrunch.com/2019/09/16/computer-scientist-richard...
Any large company that would care about the license, would also have terms on their employees to not directly interact with projects and would likely internally patch it if needed instead.
It’s been a long time since I worked there, but I don’t know any reason that they wouldn’t send a patch upstream as well.
[1] https://docs.bazel.build/versions/0.24.0/be/functions.html#l...
(Personally I’ll keep licensing my stuff ISC, but I have the luxury of having near zero external attention on any of my open source stuff :p)
The trouble is that many software developers have jobs. It sucks to work on open source software that would be useful, except that you can’t use it at work due to the license. (Possibly at a future job.)
If you fully control the software then you can dual-license, but it’s a more difficult negotiation than “it’s yet another npm with a standard license.”
A lot of maintainers do it so they don't have to re-write everything when they change jobs. Those would generally agree with you.
Others do it for personal marketing or other economic purposes such as lock-in.
Then there are maintainers who just like doing open source work and like providing something useful for many.
Just to clarify:
If you hold the copyright over the code, you can change the license to whatever you want, whenever you want.
In other words, if you are the lone dev of a library that has been GPL for 20 years, and then you get hired, and you want to use that library at your job, you can just make a copy of it and say "this copy is yours to do with as you please, have fun". You don't even need to mess around with dual-licensing. Licenses affect only people who aren't the copyright holder.
I never, not once, pushed an upstream dev to fix a thing. I provided plenty of PRs over the years. If they didn't get merged, we maintained our own locally patched version.
My job was to find a way for us all to do as little as possible to meet our security goals. Those goals were lofty and sometimes that turned out to require quite a bit of work. But we never, ever, made our problem someone else's problem.
You see the CISOs that are a pain in the ass. You don't see the ones quietly going about their business trying to make the world a little safer.
Seems impossible to read and verify. Wouldn’t it be simpler and more consistent to have a first parse the IP into an internal format then perform all logic on that?
So not speaking about this library specifically but yes, separating the address parsing part (https://www.netmeister.org/blog/inet_aton.html is a fun read) from the "is this address considered private?" part should make things much simpler.
Vast majority of related code in this space that I've seen works like that. I'm actually struggling to come up with counterexamples.
Regexes can be fine in order to parse the string representations initially. But for everything else I would stick with byte arrays etc.
I'm skeptical of that concern: executing multiple regular expression matches on a string is never going to be faster than parsing the entire string once, into a 4-octet structure, and then performing integer comparisons on the octets.
> the verification process of vulnerability reports doesn't involve maintainer at all, and it sounds like the commercial interest of advisory repositories is aligned with creating more vulnerabilities and proving themselves “useful" to companies that utilize them.
https://cvss.js.org/