I don't think that it's good to just delete the packages. Same goes for Android Apps in the Google Play Store or for Chrome Extensions.
These compromised packages should have their page set to a read-only mode with downloads/installs disabled, with a big warning that they were compromised.
This is specially troublesome with Chrome Extensions and Android Apps, where it is not possible to get to know if I actually had the extension installed, and if I had, what it was exactly about.
Chrome Extensions getting automatically removed from the browser instead of permanently deactivated with a hint of why they can't be activated again, and which was the reason why the extension got disabled, is a problem for me. How do I know if I had a bad extension installed, if personal data has been leaked?
This also applies to PyPI to some degree.
----
Eventually the downloads should get replaced with a module which, when loaded, prints out a well defined warning message and calls sys.exit() with a return code which is defined as a "vulnerability exception" which a build system can then handle.
There is the "Yank" PEP 592 semantic that can be used to mark vulnerable packages. It's adoption has been a little slow, but I agree, having these packages available and marked accordingly makes it easier for security scanning and future detection research.
Even better would be allow their install, but to have them start up with an immediate panic() sort of function (i.e., print("This package has been found to be malicious; please see pypi/evilpackagename for details"); sys.exit(99)) to force aborts of any app using those packages.
Skimming through that link, it seems that `yank` is for pulling _broken_ packages, whereas the suggestion above is to explicitly mark them as malicious.
This is why our build systems don’t use public repositories directly, and why we always pin to an exact version. Any third party dependencies (js/python/java/c/you-name-it) are manually uploaded to our Artifactory server- which itself has no internet access. All third party libraries are periodically checked for new versions, any security announcements etc, and only if we are happy do we update the internal repo.
It has been a bit of a challenge, especially with js & node and quite literally thousands of dependencies for a single library we want to use. In such cases we try avoid the library or look for a static/prepackaged version, but even then I don’t feel particularly comfortable.
Own hosting, pinning, and checksum checks are defenses against network-based attacks and compromise of the package repository.
They do nothing against trojans like these. You will be running your own pinned checksummed version of the malicious code.
If you want to stop malware published by the legitimate package author, you need to review the code you're pulling in and/or tightly sandbox it (and effective sandboxing is usually impossible for dependencies running in your own process).
Well, it does induce a timed delay between upstream release of a compromised package and when it enters the own codebase. As long as the exploit is found and published before someone manually uploads it to the local store, you're safe.
But yes, a better option would be to run your own acceptance tests on each new upstream release, and that includes profiling disk/network/cpu usage across different releases.
Depending what you mean by "review," then I agree with you.
If you mean having humans inspect every line of code of every package, well... good luck with that. But, if you're talking about automated analysis and acceptance testing, then, I think you've got something.
Like unit testing, automated testing and analysis is never going to catch 100% of all issues, but it can definitely help your SRE team sleep at night.
I did a bunch of nodejs stuff at my last gig. These teams had the of practice keeping packages up to date. Drove me frikkin nuts. So much churn, chaos.
Is this a JavaScript thing? Carried over from frontend development?
Exasperated, I finally stopped advocating for locking everything down. Everyone treated me like I was crazy. (Reproducible builds?? Pfft!!)
Happens with enterprisey Java, Sprint, Maven projects too. (I can't even comment on Python projects; I was just happy when I could get them to run.)
What's going on? FOMO?
Lock ~~Look~~ down dependencies. Only upgrade when absolutely necessary. Keep things simple to facilitate catching regression bugs and such.
Everyone I know uses some form of lock file, and most of the modern programming languages support it.
As for upgrading only when absolutely necessary, let's be honest, nothing is absolutely necessary. If the software is old, or slow, or buggy, well dear users you'll just have to deal with it.
In my experience however, it's easier to keep dependencies relatively up to date all the time, and do the occasional change that goes along with each upgrade, than waiting five years until it's absolutely necessary, at which point upgrading will be a nightmare.
I much rather spend each week 10 minutes reading through the short changelog of 5 dependencies to check that yes, that changes are simple enough that they can be merged without fear, and with the confidence that it's compatible with all the other up-to-date dependencies.
Good luck when your security department starts running XRay and demands all your dependencies are at the latest versions all the time because every second release of a package is reported as vulnerable.
I mean you can have reproducible builds while being on the upgrade train. `package-lock.json` eixsts for a reason. And the tiny pains of upgrading packages over time mean that then you don't have to deal with gargantuan leaps when that one package has the thing you want and it requires updating 10 other packages because of dependencies.
Node is a special horror because of absolute garbage like babel splitting itself into 100s of plugins and slowly killing the earth through useless HTTP requests instead of just packaging a single thing (also Jon Schlinkert wanting to up his package download counts by making a billion useless micropackages). But hey, you're choosing to use those packages.
I think if you're using them, good to stay up to date. But you can always roll your own thing or just stay pinned. Just that stuff is still evolving in the JS world (since people still aren't super satisfied with the tooling). But more mature stuff is probably fine to stick to forever.
Gotta disagree with this part. If you're making a web app package updates really need to be done on a regular cadence, ideally every quarter, but twice a year at minimum IMO. In the .Net world at least it feels like most responsible open source maintainers make it relatively painless to upgrade between incremental major versions (e.g. v4 -> v5). If you put off upgrades until someone holds a gun to your head so that your dependencies are years out of date, you're much more likely to experience painful upgrades that require a lot of work.
Backwards compatibility in the javascript world isn't great. If you stop updating for a couple of years, half your libraries have incompatible API changes. Then something like a node or UI framework update comes along and makes you update them all at once to work on the new version, and you're rewriting half your application just to move to a non-vulnerable version of your core dependency.
Java/Spring/Maven should have locked down dependencies by default. They have to go out of their way to not do that. Not that some people don't, anyway.
Typos:
> I stopped advocating for locking everything down.
I tend to assume OSS contributors/maintainers are working generally to make things better, features I'm not using/don't need aside, they're fixing bugs (that I might not know I'm hitting/about to hit), patching security holes, etc. - so that's one for the 'pro-upgrading' column.
Against that, sure, there might be something bad in the new release (maliciousness aside even, there might be a new bug!). But.. there might be in the old one I've pinned to as well? Assuming I'm not vetting everything (because how many places are, honestly) I have no reason to distrust the new version any more than my current one.
Reproducible builds are an orthogonal issue? You can still keep your dependencies' versions updated with fully reproducible builds. Ours aren't, but we do pin to specific versions (requirements.txt & yarn.lock), and keep the former up to date with pyup (creates and tests a branch with the latest version) and up to date within semver minor versions just with yarn update (committed automatically since in theory it should be safe, had to revert only occasionally).
It’s good security practice, especially for anything internet facing.
Sure, you don’t have to do it obsessively, but if you let it stagnate you can have trouble updating things when critical vulnerabilities are found, and you have a huge job because multiple APIs have changed along the way.
I get notifications every other week about my NodeJS packages having security vulnerabilities, and so I upgrade.
I'm not sure why NodeJS devs are so bad at security compared to say C++ devs. It's not like I'm getting asked to upgrade libstdc++ and eigen all the time ...
Feels like this is a business opportunity for someone.
Use case:
1. Upload a pre-reviewed package.json file.
2. The service monitors changes, and recommends updates. Recommendations might include security, bug, features, etc. It would check downstream dependencies, too. For production systems, the team might only care about security features.
3. Developer team can review recommendations, and download the new package.json.
(There are lots of opportunities to improve this: direct integration with git, etc.)
Anybody know if this sort of service exists? I know npm has _some_ of this. Maybe I'm just ignorant of how much of a solved problem this is?
node locks down dependencies for you, not only the version, but it saves a hash too. The problem is that npm install will install the newest version allowed in your config, and re lock it. However if you run npm ci, it will only install what is locked, and fail if the hashes don't match.
in python pipenv works the same way, pipenv sync will only install what is locked, and will check your hashes. I'm not sure about poetry.
I agree. Locking things down is the way to go, when it comes to safety.
The downside, however, is that, by design, you end up with packages that don't get upgraded regularly. That can cause problems down the road when you decide you do want to upgrade those packages.
For instance, there might be breaking changes, because you're jumping major versions. Of course, breaking changes are always a problem, but, if you're not regularly upgrading stuff, your team will tend to build on/build around the functionality of the old version.
That leads to some real fun come upgrade time. If, you're, say, 3 major versions behind the latest version, or whatever version you want to upgrade to that contains some Cool New Feature(tm) you really, really need, you might end up having to do this silly dance of upgrading major versions one at a time, in order to keep the complexity of the upgrade process as a whole under control.
Oh, and, sometimes things get deprecated. That's always fun to deal with.
So, TL;DR: Yes, pin versions! It's safer that way! Just be aware that, like most engineering decisions, there's a tradeoff here that saves you some pain now in exchange for some amount of a different kind of pain in the future.
Sorry, I don't quite see how this would protect against supply chain attacks. If an upstream dependency is back-doored, they just have to silently add their code in an otherwise reasonable sounding release, and now you will happily download that version, add it to your internal mirrors, and pin its version forever. Unless you actually read the diff on every update, which I think is impractical (although you're welcome to correct me), I don't see how this is buying you much.
I think part of the idea is that if you only use versions that have been released for a little while, you are hoping SOMEONE notices the malicious code before you finally update.
There are a number of issues with this approach, although the practice still might be a net benefit.
One, you are going to be behind on security patches. You have to figure out, are you more at risk from running unpatched code for longer, or from someone compromising an upstream package?
Two, if too many people use this approach, there won't be anyone who is actually looking at the code. Everyone assumes someone else would have looked, so no one looks.
Yes, this is why I implemented hash-checking in pip (https://pip.pypa.io/en/stable/topics/repeatable-installs/#ha...). Running your own server is certainly another way to solve the problem (and lets you work offline), but keeping the pinning info in version control gives you a built-in audit trail, code reviews, and one fewer server to maintain.
This is how it always used to be, back in the before[1] times. Libraries would be selected carefully and dependencies would be kept locally so that you could always reproduce a build.
The world is different now, and just being able to select a
package and integrate it like that is a massive effectiveness multiplier, but I think the industry at large lost something in the transition.
([1] before internet package management, and before even stuff like apt and yum)
The problem is that it also is a vulnerability multiplier.
People used to understand every package the installed. Now, they install dependencies of dependencies of dependencies, to the point that they have not even SEEN the name of most of their dependencies.
Install anything with maven and count the number of packages installed. It is appalling.
> I think the industry at large lost something in the transition.
Lost a lot of trust and security with the advent of language-specific installers that pull libraries and their dependencies from random URLs without any 3rd party vetting.
FYI - You can overwrite an existing package’s release/version via pip (at least when using Artifactory’s PyPi). Not safe to assume pinning the version ‘freezes’ anything.
Make a "lockfile" with the pip-compile tool [0] that includes hashes. Unless you happen to fetch the hash after the package has been compromised, this should keep you safe from an overwritten version.
Yep. You should also be hosting and deploying from wheels[0], even for stuff you create internally. If you're doing it right, you'll end up hosting your own internal PyPi server[1], which, luckily, isn't hard[2].
We did this at one of my previous companies, and, of all the things that ever went wrong with our deploy processes, our internal PyPi server was literally never the culprit.
Seems reasonable enough depending on your use case.
In our situation we store our own (private, commercial) artifacts as well as third party ones, so we already need to have a server, and we know our server is configured, maintained & monitored in a secure fashion whereas I have no guarantees with public servers.
Plus our build servers don’t have access out to the internet either, for security. Supply chain attacks like SolarWinds and Kaseya are too common these days.
Edit: Also, our local servers are faster at serving requests, allowing for faster builds, and ensures no issues with broken builds if a public repo went offline or was under attack.
Security and availability don't have to be mutually exclusive. I remember in the early day of Go modules our Docker builds (that did "go mod download") would be rate-limited by Github, so a local cache was necessary to get builds to succeed 100% of the time. (Yes, you can plumb through some authentication material to avoid this, but Github is slow even when they're not rate limiting you!) Honestly, that thing was faster than the official Go Module Proxy so I kept it around longer than necessary and the results were good.
Even if you cache modules on your own infrastructure, you should still validate the checksums to prevent insiders from tampering with the cached modules.
I'll also mention that any speed increases really depend on your CI provider. I self-hosted Jenkins on a Rather Large AWS machine, and had the proxy nearby, so it was fast. But if you use a CI provider, they tend to be severely network-limited at the node level and so even if you have a cache nearby, you are still going to download at dialup speeds. (I use CircleCI now and at one point cached our node_modules. It was as slow as just getting them from NPM. And like, really slow... often 2 minutes to download, whereas on my consumer Internet connection it's on the order of 10 seconds. Shared resources... always a disaster.)
It's going to depend on your circumstances. You don't share any context about your app or any of your development. Rather than looking at this from the perspective of needing an artifact server, you just look at it as a case of supply chain protection.
If pinning dependencies counters all the threats in your threat model, then fine. If not, you need to be doing something to counter them. An artifact server, or vendoring your dependencies, provides provides a lot of additional control where chokepoints or additional audits can be inserted.
If there was no management cost or hassle then you'd just have an artifact server to give you a free abstraction, but it's a trade-off for many people. It's also not a solution in itself, you need to be able to do audits and use the artifact server to your advantage.
The problem is really with the threat models and whether someone really knows what they need to defend against. I find that many engineers are naïve to the threats since they've never personally had exposure to an environment where these are made visible and countered. At other times, engineers are aware, but it's a problem of influencing management.
We use an artifact server and our build servers are completely airgapped. We know exactly what dependencies are used across the organisation. We can take centralised action against malicious dependencies.
I wouldn't bother having one if you're small (<25) people. If you start having a centralised Infosec group, then it starts to become necessary.
Depending on what you're using for package management, an "artifact server" can be as simple as a directory in git or a dumb HTTP server. File names are non-colliding and you don't really need an audit log on that server, because all references are through lock-files in git with hashes (right? RIGHT?), so it basically doesn't matter what's on there.
You should mention this in your interviews. Keeping up to date with the state of the art is implicit for me. If I need to spend months retraining or up training because of company policy I expect to be compensated for that while employed.
We are upfront about it in interviews. We definitely don’t want unhappy developers, but we also don’t want insecure code. We do upgrade libraries but we do so only after analysing risk and impact. None of our developers have spent months training to develop our code.
It seems to me like one low hanging fruit to make a lot of these kinds of exploits significantly more difficult is protection at a language level about which libraries are allowed to make outgoing HTTP requests or access the file system. It would be great if I could mark in my requirements.txt that a specific dependency should not be allowed to access the file system or network, and have that transitively apply to everything it calls or eval()'s. Of course, it would still be possible to make malware that exfiltrates data through other channels, but it would be a lot harder.
I am not aware of any languages or ecosystems that do this, so maybe there's some reason this won't work that I'm not thinking of.
Deno (a Node-like runtime by the original author of Node) has a security model kind of like this [0]. Its unfortunately not as granular as I think it should be (only operates on the module level and not individual dependencies), but its a start.
Ryan Dahl, (above mentioned creator of Deno), gave his second podcast interview ever this spring, it went live June 8th. [1]
It covers a lot of terrain including the connectivity permissions control.
I recommend it as an easy way to learn about Deno and how it is different from Node as it is today.
Node seems to have evolved to handle some of what Deno set out to do at the start. It is worth hearing from Dahl why Deno is still relevant and for what use cases.
Dahl speaks without ego and addresses interesting topics like the company built around Deno and monetization plans.
What you want sounds like the way Java sandboxing worked (commonly seen with Java applets). The privileged classes which do the lower-level operations (outgoing network requests, filesystem access, and so on) ask the Java security code to check whether the calling code has permission to do these operations, and that Java security code throws an exception when they're not allowed.
Yes, and also, this is an extraordinarily complex design to implement and get right. Java more or less failed in contexts where it was expected to actually enforce those boundaries reliably - untrusted applets on the web. It's working great in cases where the entire Java runtime and all libraries are at the same trust level and sandboxing/access control measures, if any, are applied outside the whole process - web servers, unsandboxed desktop applications like Bazel or Minecraft, Android applications, etc. Security vulnerabilities in Java-for-running-untrusted-applets happened all the time; security vulnerabilities that require you to update the JRE on your backend web servers are much rarer.
If you make a security boundary, people are going to rely on it / trust it, and if people rely on it, attackers are going to attack it for real. Making attacks harder isn't enough; some attacker will just figure it out, because there's an incentive for them to do so. It is often safer in practice not to set up the boundary at all so that people don't rely on it.
If I’m not mistaken I think that some languages with managed effects allow you to do this through types. For example, in Elm HTTP requests have the type of Cmd Msg and the only way to actually have the IO get executed is to pass that Cmd Msg to the runtime through the update function. This means you can easily get visibility, enforced by the type system, into what your dependencies do and restrict dependencies from making http requests or doing other effects.
I started building this at one point. Basically there's an accompanying manifest.toml that specifies permissions for the packages with their checksums, and then it can traverse the dependency graph finding each manifest.toml.
It also generated a manifest.lock so if manifests changed you would know about it.
Then once it built up the sandbox it would execute the build. If no dependencies require networking, for example, it gets no networking, etc.
I stopped working on it because I didn't have time, and it obviously relied on everyone doing the work of writing a manifest.toml and using my tool, plus it only supported rust and crates.io
TBH it seems really easy to solve this problem, it's very well worn territory - Browser extensions have been doing the same thing for decades. Similarly, why can I upload a package with a near-0 string distance to another package? That'd help a massive amount against typosquatting.
No one wants to implement it who also implements package managers I guess.
This can be done with capability-based operating system, though it requires running the libraries you want to isolate in a separate process.
On a capability-based OS you whitelist the things a given process can do.
For instance, you can give a process the capability to read a given directory and write to a different directory, or give the capability to send http traffic to a specific URL. If you don't explicitly give those capabilities, the process can't do anything.
I've been thinking about this a lot as I consider the scripting language for finl.¹ I had considered an embedded python, but ended up deciding it was too much of a security risk, since, without writing my own python interpreter (something I don't want to do), sandboxing seems to be impossible. Deno is a real possibility or else forking Boa which is a pure Rust implementation of JS.
1. One thing I absolutely do not want to do is replicate the unholy mess that's the TeX macro system. There will be simple macro definitions possible, but I don't plan on making them Turing complete or having the complex expansion rules of TeX.
I was wondering earlier how useful deno's all-or-nothing policies would actually be in the real world. It seems like rules like this (no dep network requests, intranet only, only these ips) are much more useful than "never talk to the web".
For python this probably wont ever be possible given the way the import system works and the patching packages can do.
Portmod[0] is a package manager for game modifications (currently Morrowind and Doom), and it runs sandboxed Python scripts to install individual packages. So I think this is possible, but it's not a built-in feature of the runtime as is the case for deno.
What do you expect it to be "encrypted" with? Unless the user is entering a password every time they start the browser, there's nothing unique to a system that other malware can't just extract and use to decrypt the database.
Right, but I wouldn't have expected that processes outside of chrome could get at its internally managed db (or encrypted properties), especially if it's using an authenticated (chrome) user profile.
Windows doesn't have any application firewalls by default? I thought that was the whole thing that came in with Vista that people were upset about. (Of course, thinking it through, Linux isn't any better, assuming the process is running as the same user.)
Anyone can upload anything to PyPI. This is kind of like saying that you detected malicious packages on GitHub - the question is whether anyone actually ran it.
They say that the packages were downloaded 30,000 times, but automated processes like mirrors can easily inflate this. (As can people doing the exact sort of research they were doing - they themselves downloaded the files from PyPI!) Quoting PyPI maintainer Dustin Ingram https://twitter.com/di_codes/status/1421415135743254533 :
> *And here's your daily reminder that download statistics for PyPI are hugely inflated by mirrors & scrapers. Publish a new package today and you'll get 1000 'downloads' in 24 hours without even telling anyone about it.*
Part of the issue is that FOSS, libraries, and independendent package managers and their specific repositories have exploded in about every domain. No longer are there a handful of places where software and libraries exist. Pick an ecosystem and there's probably a sub or additional levels of package/library management ecosystems below it. Developers have really bought into grabbing a package for everything and leveraging hundreds and thousands of packages, most of which have limited to no sort of vetting. We've had software complexity growing over the years, but the one benefit in previous years is that it was in fairly concentrated areas where many eyes were often watching. You could somewhat rely on the fact that someone had looked through and approved such additions to a package repo. It's a naive security but there were more professional eyes you could leverage, lowering overall risk.
Not anymore, it's more of this breakneck speed, leverage every package you can to save resources and glue them together without looking at them in detail, because the entire reason you're using them is because you don't have time. It's not all shops, plenty of teams vet or roll their own functionality to avoid this but there's a large world of software out there that just blindly trusts everything down the chain in an era where there should be less trust. Some software shops have never seen a package or library they didn't like and will use even trivial to implement packages (the benefit of your own implementation being you know it's secure and won't change under your feet unless an inside threat makes the change). There's a tradeoff to externalizing costs and tech debt for maintainance you pass on using these systems, the cost being you take on more risk in various forms.
> Anyone can upload anything to PyPI. This is kind of like saying that you detected malicious packages on GitHub - the question is whether anyone actually ran it.
There's a bigger social problem here. In many communities it has become completely normalized for any dependency to be just added from these types of "anyone can upload" repositories without any kind of due diligence as to provenance or security. It's as if these communities have just given up on that.
For example, if I suggest that a modern web app only use dependencies that ship in Debian (a project that does actually take this kind of thing seriously), many would laugh me out of the building.
The only practical alternative in many cases is to give up trying. It's now rare for projects to properly audit their dependencies because the community at large isn't rallying around doing it. It's a vicious circle.
This kind of incident serves as a valuable regular reminder of the risks that these communities are taking. Dismissing this by saying "anyone can upload" misses the point.
> It's now rare for projects to properly audit their dependencies
In the Python ecosystem, it is at least pretty easy to limit yourself to a handful of developers you trust (e.g. Django developers, Pallets developers, etc.).
In the npm ecosystem however, for instance I just ran `npx create-react-app` and got a node_modules with 1044 subdirectories, a 11407-line yarn.lock, and "207 vulnerabilities found" from `yarn audit`. Well what can you possibly do.
>> if I suggest that a modern web app only use dependencies that ship in Debian (a project that does actually take this kind of thing seriously), many would laugh me out of the building
And you are more right, and they are more wrong than they know.
Not only are malicious inserts in code a problem in themselves, if you have failed to properly vet your dependencies and it causes real losses for one of your users/customers, YOU have created a liability for yourself. Sure, many customers may never figure it out, and it might take them a while to prove it in court, but if it even gets to the point where someone is damaged and notices, and decides to do something about it, you have defense costs.
The "whatever" attitude has no place in serious engineering of any kind, and anyone with a "move fast and break things" attitude (unless these tests are properly sandboxed) shows that they are not engaged in any serious engineering.
Doesn't installing a python package from PyPI (optionally) run some of the code in the package? Like "setup.py" ? I'd take advantage of that if I were injecting malicious code in a module.
For what it's worth, npm supports an option "ignore-scripts" for both "npm install" and "npm ci" (the latter of which ensures the installed packages match the integrity hashes from the package-lock.json file).
I wonder how many Python packages have a justifiable reason for using `eval()` to begin with. I've been writing Python professionally for almost a decade and I've never run into a use case where it has been necessary. It's occasionally useful for debugging, but that's all I've ever legitimately considered it for.
It's neat that JFrog can detect evaluation of encoded strings, but I think I'd prefer to just set a static analysis rule which prevents devs from using `eval()` in the first place.
You can always call eval without ever mentioning eval in code:
__builtins__.__dict__[''.join(chr(x^y^(i+33)) for i,(x,y) in enumerate(zip(*[iter(ord(z) for z in '2vb63qz2')]*2)))]("print('hello, world')")
Maybe there are ways to detect all of the paths, but it feels like a tricky quest down lots of rabbit holes to me.
There are also some fairly big packages that use eval(), like flask, matplotlib, numba, pandas, and plenty of others. Perhaps they could be modified to not use eval, but it might be more common than you expect.
I don't think there's a good reason to have eval in interpreted languages. Sure the REPL uses it but it could be implemented internal to the REPL instead of exposing it in the language.
These compromised packages should have their page set to a read-only mode with downloads/installs disabled, with a big warning that they were compromised.
This is specially troublesome with Chrome Extensions and Android Apps, where it is not possible to get to know if I actually had the extension installed, and if I had, what it was exactly about.
Chrome Extensions getting automatically removed from the browser instead of permanently deactivated with a hint of why they can't be activated again, and which was the reason why the extension got disabled, is a problem for me. How do I know if I had a bad extension installed, if personal data has been leaked?
This also applies to PyPI to some degree.
----
Eventually the downloads should get replaced with a module which, when loaded, prints out a well defined warning message and calls sys.exit() with a return code which is defined as a "vulnerability exception" which a build system can then handle.
https://www.python.org/dev/peps/pep-0592/
It has been a bit of a challenge, especially with js & node and quite literally thousands of dependencies for a single library we want to use. In such cases we try avoid the library or look for a static/prepackaged version, but even then I don’t feel particularly comfortable.
I should really start specifying checksums too.
They do nothing against trojans like these. You will be running your own pinned checksummed version of the malicious code.
If you want to stop malware published by the legitimate package author, you need to review the code you're pulling in and/or tightly sandbox it (and effective sandboxing is usually impossible for dependencies running in your own process).
But yes, a better option would be to run your own acceptance tests on each new upstream release, and that includes profiling disk/network/cpu usage across different releases.
If you mean having humans inspect every line of code of every package, well... good luck with that. But, if you're talking about automated analysis and acceptance testing, then, I think you've got something.
Like unit testing, automated testing and analysis is never going to catch 100% of all issues, but it can definitely help your SRE team sleep at night.
Is this a JavaScript thing? Carried over from frontend development?
Exasperated, I finally stopped advocating for locking everything down. Everyone treated me like I was crazy. (Reproducible builds?? Pfft!!)
Happens with enterprisey Java, Sprint, Maven projects too. (I can't even comment on Python projects; I was just happy when I could get them to run.)
What's going on? FOMO?
Lock ~~Look~~ down dependencies. Only upgrade when absolutely necessary. Keep things simple to facilitate catching regression bugs and such.
Oh well. I moved on. I don't miss it one bit.
As for upgrading only when absolutely necessary, let's be honest, nothing is absolutely necessary. If the software is old, or slow, or buggy, well dear users you'll just have to deal with it.
In my experience however, it's easier to keep dependencies relatively up to date all the time, and do the occasional change that goes along with each upgrade, than waiting five years until it's absolutely necessary, at which point upgrading will be a nightmare.
I much rather spend each week 10 minutes reading through the short changelog of 5 dependencies to check that yes, that changes are simple enough that they can be merged without fear, and with the confidence that it's compatible with all the other up-to-date dependencies.
Node is a special horror because of absolute garbage like babel splitting itself into 100s of plugins and slowly killing the earth through useless HTTP requests instead of just packaging a single thing (also Jon Schlinkert wanting to up his package download counts by making a billion useless micropackages). But hey, you're choosing to use those packages.
I think if you're using them, good to stay up to date. But you can always roll your own thing or just stay pinned. Just that stuff is still evolving in the JS world (since people still aren't super satisfied with the tooling). But more mature stuff is probably fine to stick to forever.
Gotta disagree with this part. If you're making a web app package updates really need to be done on a regular cadence, ideally every quarter, but twice a year at minimum IMO. In the .Net world at least it feels like most responsible open source maintainers make it relatively painless to upgrade between incremental major versions (e.g. v4 -> v5). If you put off upgrades until someone holds a gun to your head so that your dependencies are years out of date, you're much more likely to experience painful upgrades that require a lot of work.
Typos:
> I stopped advocating for locking everything down.
started?
> Look down dependencies.
lock?
Against that, sure, there might be something bad in the new release (maliciousness aside even, there might be a new bug!). But.. there might be in the old one I've pinned to as well? Assuming I'm not vetting everything (because how many places are, honestly) I have no reason to distrust the new version any more than my current one.
Reproducible builds are an orthogonal issue? You can still keep your dependencies' versions updated with fully reproducible builds. Ours aren't, but we do pin to specific versions (requirements.txt & yarn.lock), and keep the former up to date with pyup (creates and tests a branch with the latest version) and up to date within semver minor versions just with yarn update (committed automatically since in theory it should be safe, had to revert only occasionally).
It’s good security practice, especially for anything internet facing.
Sure, you don’t have to do it obsessively, but if you let it stagnate you can have trouble updating things when critical vulnerabilities are found, and you have a huge job because multiple APIs have changed along the way.
I'm not sure why NodeJS devs are so bad at security compared to say C++ devs. It's not like I'm getting asked to upgrade libstdc++ and eigen all the time ...
Use case:
1. Upload a pre-reviewed package.json file. 2. The service monitors changes, and recommends updates. Recommendations might include security, bug, features, etc. It would check downstream dependencies, too. For production systems, the team might only care about security features. 3. Developer team can review recommendations, and download the new package.json.
(There are lots of opportunities to improve this: direct integration with git, etc.)
Anybody know if this sort of service exists? I know npm has _some_ of this. Maybe I'm just ignorant of how much of a solved problem this is?
in python pipenv works the same way, pipenv sync will only install what is locked, and will check your hashes. I'm not sure about poetry.
The downside, however, is that, by design, you end up with packages that don't get upgraded regularly. That can cause problems down the road when you decide you do want to upgrade those packages.
For instance, there might be breaking changes, because you're jumping major versions. Of course, breaking changes are always a problem, but, if you're not regularly upgrading stuff, your team will tend to build on/build around the functionality of the old version.
That leads to some real fun come upgrade time. If, you're, say, 3 major versions behind the latest version, or whatever version you want to upgrade to that contains some Cool New Feature(tm) you really, really need, you might end up having to do this silly dance of upgrading major versions one at a time, in order to keep the complexity of the upgrade process as a whole under control.
Oh, and, sometimes things get deprecated. That's always fun to deal with.
So, TL;DR: Yes, pin versions! It's safer that way! Just be aware that, like most engineering decisions, there's a tradeoff here that saves you some pain now in exchange for some amount of a different kind of pain in the future.
especially bad if the older version you are on turns out to have vulns.
Josh Bloch says to update your packages frequently and I agree.
Dead Comment
It’s also good that your business doesn’t have to rely on a third party every time you need to pull down your dependencies and build your software.
There are a number of issues with this approach, although the practice still might be a net benefit.
One, you are going to be behind on security patches. You have to figure out, are you more at risk from running unpatched code for longer, or from someone compromising an upstream package?
Two, if too many people use this approach, there won't be anyone who is actually looking at the code. Everyone assumes someone else would have looked, so no one looks.
This does seem impractical at even a modest scale.
The world is different now, and just being able to select a package and integrate it like that is a massive effectiveness multiplier, but I think the industry at large lost something in the transition.
([1] before internet package management, and before even stuff like apt and yum)
People used to understand every package the installed. Now, they install dependencies of dependencies of dependencies, to the point that they have not even SEEN the name of most of their dependencies.
Install anything with maven and count the number of packages installed. It is appalling.
Lost a lot of trust and security with the advent of language-specific installers that pull libraries and their dependencies from random URLs without any 3rd party vetting.
[0]: https://pypi.org/project/pip-tools/
We did this at one of my previous companies, and, of all the things that ever went wrong with our deploy processes, our internal PyPi server was literally never the culprit.
---
[0]: https://pythonwheels.com/
[1]: https://github.com/testdrivenio/private-pypi
[2]: https://testdriven.io/blog/private-pypi/
Teaching good security practices and application lifecycle management seems to always be an uphill battle.
In our situation we store our own (private, commercial) artifacts as well as third party ones, so we already need to have a server, and we know our server is configured, maintained & monitored in a secure fashion whereas I have no guarantees with public servers.
Plus our build servers don’t have access out to the internet either, for security. Supply chain attacks like SolarWinds and Kaseya are too common these days.
Edit: Also, our local servers are faster at serving requests, allowing for faster builds, and ensures no issues with broken builds if a public repo went offline or was under attack.
Even if you cache modules on your own infrastructure, you should still validate the checksums to prevent insiders from tampering with the cached modules.
I'll also mention that any speed increases really depend on your CI provider. I self-hosted Jenkins on a Rather Large AWS machine, and had the proxy nearby, so it was fast. But if you use a CI provider, they tend to be severely network-limited at the node level and so even if you have a cache nearby, you are still going to download at dialup speeds. (I use CircleCI now and at one point cached our node_modules. It was as slow as just getting them from NPM. And like, really slow... often 2 minutes to download, whereas on my consumer Internet connection it's on the order of 10 seconds. Shared resources... always a disaster.)
If pinning dependencies counters all the threats in your threat model, then fine. If not, you need to be doing something to counter them. An artifact server, or vendoring your dependencies, provides provides a lot of additional control where chokepoints or additional audits can be inserted.
If there was no management cost or hassle then you'd just have an artifact server to give you a free abstraction, but it's a trade-off for many people. It's also not a solution in itself, you need to be able to do audits and use the artifact server to your advantage.
The problem is really with the threat models and whether someone really knows what they need to defend against. I find that many engineers are naïve to the threats since they've never personally had exposure to an environment where these are made visible and countered. At other times, engineers are aware, but it's a problem of influencing management.
I wouldn't bother having one if you're small (<25) people. If you start having a centralised Infosec group, then it starts to become necessary.
Deleted Comment
I am not aware of any languages or ecosystems that do this, so maybe there's some reason this won't work that I'm not thinking of.
[0] https://deno.land/manual/getting_started/permissions
It covers a lot of terrain including the connectivity permissions control.
I recommend it as an easy way to learn about Deno and how it is different from Node as it is today.
Node seems to have evolved to handle some of what Deno set out to do at the start. It is worth hearing from Dahl why Deno is still relevant and for what use cases.
Dahl speaks without ego and addresses interesting topics like the company built around Deno and monetization plans.
[1] https://changelog.com/podcast/443
If you make a security boundary, people are going to rely on it / trust it, and if people rely on it, attackers are going to attack it for real. Making attacks harder isn't enough; some attacker will just figure it out, because there's an incentive for them to do so. It is often safer in practice not to set up the boundary at all so that people don't rely on it.
It also generated a manifest.lock so if manifests changed you would know about it.
Then once it built up the sandbox it would execute the build. If no dependencies require networking, for example, it gets no networking, etc.
I stopped working on it because I didn't have time, and it obviously relied on everyone doing the work of writing a manifest.toml and using my tool, plus it only supported rust and crates.io
TBH it seems really easy to solve this problem, it's very well worn territory - Browser extensions have been doing the same thing for decades. Similarly, why can I upload a package with a near-0 string distance to another package? That'd help a massive amount against typosquatting.
No one wants to implement it who also implements package managers I guess.
On a capability-based OS you whitelist the things a given process can do. For instance, you can give a process the capability to read a given directory and write to a different directory, or give the capability to send http traffic to a specific URL. If you don't explicitly give those capabilities, the process can't do anything.
1. One thing I absolutely do not want to do is replicate the unholy mess that's the TeX macro system. There will be simple macro definitions possible, but I don't plan on making them Turing complete or having the complex expansion rules of TeX.
For python this probably wont ever be possible given the way the import system works and the patching packages can do.
[0]: https://gitlab.com/portmod/portmod
def cs():
```Where does master_key come from here? Is chrome encryption of sensitive information really as weak as that?
> Keychain items can be shared only between apps from the same developer.
https://support.apple.com/guide/security/keychain-data-prote...
Windows doesn't have any application firewalls by default? I thought that was the whole thing that came in with Vista that people were upset about. (Of course, thinking it through, Linux isn't any better, assuming the process is running as the same user.)
Also a copy of noblesse2, which I didn't bother to look into due to obfuscation: https://pypi.tuna.tsinghua.edu.cn/packages/15/59/cbdeed656cf...
They say that the packages were downloaded 30,000 times, but automated processes like mirrors can easily inflate this. (As can people doing the exact sort of research they were doing - they themselves downloaded the files from PyPI!) Quoting PyPI maintainer Dustin Ingram https://twitter.com/di_codes/status/1421415135743254533 :
> *And here's your daily reminder that download statistics for PyPI are hugely inflated by mirrors & scrapers. Publish a new package today and you'll get 1000 'downloads' in 24 hours without even telling anyone about it.*
Not anymore, it's more of this breakneck speed, leverage every package you can to save resources and glue them together without looking at them in detail, because the entire reason you're using them is because you don't have time. It's not all shops, plenty of teams vet or roll their own functionality to avoid this but there's a large world of software out there that just blindly trusts everything down the chain in an era where there should be less trust. Some software shops have never seen a package or library they didn't like and will use even trivial to implement packages (the benefit of your own implementation being you know it's secure and won't change under your feet unless an inside threat makes the change). There's a tradeoff to externalizing costs and tech debt for maintainance you pass on using these systems, the cost being you take on more risk in various forms.
There's a bigger social problem here. In many communities it has become completely normalized for any dependency to be just added from these types of "anyone can upload" repositories without any kind of due diligence as to provenance or security. It's as if these communities have just given up on that.
For example, if I suggest that a modern web app only use dependencies that ship in Debian (a project that does actually take this kind of thing seriously), many would laugh me out of the building.
The only practical alternative in many cases is to give up trying. It's now rare for projects to properly audit their dependencies because the community at large isn't rallying around doing it. It's a vicious circle.
This kind of incident serves as a valuable regular reminder of the risks that these communities are taking. Dismissing this by saying "anyone can upload" misses the point.
In the Python ecosystem, it is at least pretty easy to limit yourself to a handful of developers you trust (e.g. Django developers, Pallets developers, etc.).
In the npm ecosystem however, for instance I just ran `npx create-react-app` and got a node_modules with 1044 subdirectories, a 11407-line yarn.lock, and "207 vulnerabilities found" from `yarn audit`. Well what can you possibly do.
And you are more right, and they are more wrong than they know.
Not only are malicious inserts in code a problem in themselves, if you have failed to properly vet your dependencies and it causes real losses for one of your users/customers, YOU have created a liability for yourself. Sure, many customers may never figure it out, and it might take them a while to prove it in court, but if it even gets to the point where someone is damaged and notices, and decides to do something about it, you have defense costs.
The "whatever" attitude has no place in serious engineering of any kind, and anyone with a "move fast and break things" attitude (unless these tests are properly sandboxed) shows that they are not engaged in any serious engineering.
https://i.imgur.com/Ryr2voN.png
That shell script runs 'make && make install' on a couple of bundled dependencies, but in principle it could do anything https://github.com/aws/aws-lambda-python-runtime-interface-c...
https://docs.npmjs.com/cli/v7/commands/npm-install/#ignore-s...
https://docs.npmjs.com/cli/v7/commands/npm-ci/#ignore-script...
It's analagous to downloading vs. running an executable.
https://news.ycombinator.com/item?id=28022035
3 days ago and it was killed, I wonder why?...
It's neat that JFrog can detect evaluation of encoded strings, but I think I'd prefer to just set a static analysis rule which prevents devs from using `eval()` in the first place.
There are also some fairly big packages that use eval(), like flask, matplotlib, numba, pandas, and plenty of others. Perhaps they could be modified to not use eval, but it might be more common than you expect.
(I've also used exec() for some nasty bundling of multiple python files into one before)
> This Module Optimises your PC For Python
Well, it does... just not for your Python...