Ironically, on a previous team I had switched our log4j2 log formats over to %m{nolookups} like 8 months ago... I didn't realize the whole jndi issue, what we ran into was the O(n^2) behavior of its string substitution.
While deploying an ancillary change, our jvms started locking up for minutes on end. What was happening was that we were logging customer input, and the change caused it to run certain things in parallel, which ended up logging the data multiple times. Normally the extra logging didn't matter but one customer had data like "${foo} ${bar} ${baz} ...". Even when the ${foo} portion is replaced wothout modification, this triggers quadratic behavior. So we were already potentially vulnerable to the DOS but it was rare enough that we never got locked up until logging the string multiple times, which then overflowed log4js internal buffer and blocked worker threads.
You can try this yourself by just logging a string like "${}${}${}..." And in fairly short order it starts taking forever. I'm very glad the fix in 2.15 is to disable lookups by default.
I hope that in the time after I left, the security org at the big tech company I worked at and reported this to (as I thought it was - a dos vector, not the complete pwnage it actually was) forced teams to switch to nolookups. Otherwise a lot of people had a bad week forcing updates through...
> So we were already potentially vulnerable to the DOS [...]
> the security org at the big tech company I worked at and reported this to
I'm confused about these two statements, because I did not find any recent CVEs for log4j in the DoS category, nor related to format lookup (other than CVE-2021-44228 of course).
Perhaps I misread it, but are you basically saying that (after you reported the issue to them internally) the security team at your previous company could not successfully report a DoS vulnerability in the default configuration of a widely used (by them, at least) Apache library and make sure a CVE got assigned to track it?
If so, it would be interesting to know where the CVE/vuln-reporting chain broke, possibly to reduce the blast radius for similar future cases.
Hypothetically speaking, a CVE in March for a DoS in a problematic design/feature could have resulted in flipping the default setting earlier. Instead of chasing live RCE in the wild in December.
no they're saying they discovered the behavior of their Log4j that was using interpolation was so slow that is had the potential of causing a DDoS at their company
There seems to be a misunderstanding here. We have on the one side a garbage feature that should never have been implemented - but if you want to keep it for backwards compatibility, sure. But then we have log4j scanning all values instead of only format strings - I think it can be argued that this behavior is a critical bug and was never intended to begin with. It seems to have only come about because whoever implemented the JNDI stuff lost their bearing in the absurd class hierarchies and abstractions in log4j.
Of course the last part holds the solution for our backwards compatibility issue. Remove the JNDI nonsense from the default package and move it into an extension package. Whoever wants to keep it can just add that to their dependencies and continue to enjoy logging functions that sometimes also make network connections and block your program.
Indeed - as evidence for this, I would submit that slf4j and logback were created to offer a drop in replacement for log4j (slf4j literally provides alternative implementations of the org.apache.log4j.Logger class), but I have never seen anybody complain that "I switched to logback and slf4j and my jndi substitutions stopped working."
Nobody thought this was how log4j worked; log4j's documentation for format syntax only covers {} placeholders - the same format that slf4j has grandfathered in from log4j.
I agree this feels like a case where they got confused about their internal terminology. Log4j refers to messages with {} placeholders as 'FormattedMessages'; it refers to the log pattern syntax as 'Patterns' in code - but it seems to refer to them as 'log formats' in documentation.
Somewhere in this mess, someone hooked up the pattern capabilities into the formatting system.
> but I have never seen anybody complain that "I switched to logback and slf4j and my jndi substitutions stopped working."
SLF4J was created to replace Apache Commons Logging and Logback was created to replace Log4j 1.x. Both were created Ceki Gülcü, the original author of Log4j 1.x [1].
Logback came out in 2006. The first beta version of Log4j 2.x was only released 6 years later in 2012, and the JNDI lookup feature was added in 2.0-beta9[2] in 2013!
Obviously nobody complained when switching from Log4j 1.x to SLF4J+Logback that a feature from a completely different library (with the same name) that would be created 7 years into the future was not supported.
> Somewhere in this mess, someone hooked up the pattern capabilities into the formatting system.
That's not what happened. The lookup mechanism (which includes "${jndi:}" lookups) is completely unrelated to the message formatting subsystem.
The way formatting and pattern lookups work in log4j2 is:
1. logger.info("Hello {}", "world") creates a FormattedMessage instance with the "Hello {}" format string and a single parameter, "world".
2. The FormattedMessage is wrapped in a LogEvent and routed to the correct appender(s).
3. Most appenders will format the LogEvent with a Layout. In our case, it's PatternLayout we care about[3].
4. PatternLayout will pre-calculate a set of PatternConverters based on your pattern, so it doesn't have to keep parsing the pattern on every invocation. "%m" will map to MessagePatternConverter.
5. (grossly simplifying zero-garbage and streaming optimizations) Each pattern converter is executed and appends to the final layout text's StringBuilder.
6. (grossly simplifying oh so many things) MessagePatternConverter will first call event.getMessage().getFormattedMessage(). The logic for formatting the message is entirely encapsulated by Message and its subclasses. MessagePatternConverter has no way to distinguish the format string from the user-provided parameters!
7. MessagePatternConverter finally applies the pattern lookups to the formatted message text. The pattern lookup mechanism is completely separate from and orthogonal to the message formatting mechanism.
---
That was long-winded, but I had to fight these annoying misconception about "log4j not implemented format strings properly".
Now, there are several things I'm not saying here:
1. I don't think more than a handful of people ever relied on lookups working on the log message (formatted or otherwise), as opposed to the pattern in the configuration file.
2. I don't think Log4j should have kept compatibility here. The moment the maintainers implemented "%m{nolookups}" (on version 2.7), they should have made it the default. That being said, I know this is very hard to do in the Java ecosystem. But I think it is time that the Java developer community changes its extremist position regarding compatibility at all costs.
3. I don't think that Log4j should have implemented pattern lookups for text messages to begin with. Even if was just the format string part (which is impossible to do with Log4j's current architecture anyway).
4. I don't think any kind of string formatting should be included in a logging library. If you want to format log messages, use an external formatting function or string interpolation (if you're lucky enough to be using Kotlin or Scala). If it is added, it should only be used as a convenience, and shouldn't do anything more than formatting (like lookups). Relying on developers to always remember that log.info("Hello {}", world) is safe and log.info("Hello {}" + world) gives the entire internet full control of your server is beyond stupid. Even if Log4j went with this silly distinction, I would say it was a horrible design.
[3] It seems like PatternLayout is the only layout vulnerable to this bug in log4j2, but it is hard to tell, the implementation being a classic Java mess of deep class hierarchy, liberal use of reflection to control everything and some heroic attempts to break SOLID principles at least 4 times on a single line of code. Take my analysis with a grain of salt. It's a gross simplification of what is unfortunately par for the course in many Java libraries.*
> But then we have log4j scanning all values instead of only format strings - I think it can be argued that this behavior is a critical bug and was never intended to begin with.
It was actually intended behavior, and this is what really boggles the mind!
Javadoc says explicitly that variable replacement is recursive, with cycle detection (which will throw! What happens to the log line in this case?) [0].
That link is about variable replacement in config strings, which is intentionally recursive. It doesn't mention the use of the variable replacement mechanism when interpolating values into log messages, which is what makes this vulnerability so bad, and as far as I can see was not intentional.
Right, I was also confused by the blame on backward compatibility. You can keep things backward compatible without necessarily making it on by default. There is no reason why `formatMsgNoLookups` should the default. If it is indeed an obscure and hacky feature for backward compatibility, just make it opt-in. People who really care about it will enable it, most people won't have to carry that baggage and we wouldn't be in a situation like this.
>for a feature we all dislike yet needed to keep due to backward compatibility concerns.
If they really dislike the feature that much, they likely dislike the code and want to completely delete it. I'm not sure if making it opt-in would make them as happy as fully deleting it, so they are less motivated to make it opt-in than they would be to fully delete it.
"lost their bearing in the absurd class hierarchies and abstractions" sounds familiar. Java app stack traces are like Neal Stephenson epics, but less entertaining.
And enabled by default. That's the most mind-blowing bit of this feature. The backcompat argument is a deflection for shipping a time bomb into people's codebases.
To be fair to the maintainers, they didn't ship anything into people's codebases. People chose Log4j and pulled it into their code. FOSS contributers aren't responsible for downstream use of their projects.
> for a feature we all dislike yet needed to keep due to backward compatibility concerns.
It's logging. While logging is extremely important, I think we could all tolerate removing a vulnerable feature. Or, just move the feature to a separate package.
I have made bad decisions, we have all made bad decisions. Own them, improve, and celebrate the opportunity to learn and improve. Keeping this around, as a default, was a bad decision. If your enterprise contracts don't want to turn a flag on, then they can always skip upgrading (they generally do regardless).
> Should maintainers of all core apache libs just remove or disable features they don’t like, when not known to be insecure?
I'd bet more will start doing so. If nobody is excited to keep the feature up and any unloved code contains risks, getting rid of it seems fine to me. If companies want that code maintained, they can pay up or get one of their people to do it.
If you are not being paid for it why build features you don't like? That is what you do in your day job! Your hobby project should atleast should make you happy ?
I can imagine the maintainers being scared of silently breaking the workflow and monitoring for some users. If you change this feature to opt-in, you may silently break the alerting system users built on top of this feature, and then you get the heat for breaking a somebody’s IT system(a hospital maybe), just because you hated that feature. That it had an RCE would not be known at the time.
In a perfect world, the feature would have been an option from the start, but in that same perfect world, the downstream users would be diligent and check release notes before upgrading. You might, but many of your colleagues don’t, they just upgrade, and complain when their system breaks.
One place I worked used syslog to ship important analytics data from services to Kafka. log4j is a reasonable choice for logging to syslog from Java (but let's be honest, you should be on Logback). Now, using jndi as part of this? That's getting a little too clever.
Keeping this around, as a default, was a bad decision.
Definitely. But really, they were screwed once it had shipped. They could and should have disabled it in an update long ago, but then anyone who read the release notes or the code would know how to exploit the millions of un-updated systems.
People and, many companies seem to forget that such software comes "AS IS" and it means, AS IS, I would be glad to see fortune 500 companies try to put together a team providing flawless logging capabilites. In reality I know they would not be able to get to be half as good as an open source library, fist of all drowning developers in unnecessary administrative tasks, imposing stupidly unreasonable deadlines and fully ignoring engineering advice from... well the engineering team. It's an insult that those companies profiting masively from so many open source projects still have the audacity to put blame on (again) software whose premise is "AS IS" specially when if you look at their projects (even the ones they sell to their customers) are basically bullshit put thogether with spit and boogers (and I've work in more than one FAANG to know this is truth by experience)
Cool, now make it so that only high-severity logs from a particular set of subroutines get sent to a particular subset of employees, grouped into a daily email.
And make it possible to change any of those knobs at runtime, without touching the code (minimum severity, set of subroutines, set of recipients, delivery method).
Now add a date/time stamp. And thread name, current class/method, request trace ID, severity level, etc., to every log line in your app. Or just grab your favorite logging lib from Maven Central and call it a day.
I'm still flabbergasted that the original maintainers are rushing around trying to patch these problems. Unless their specific personal/professional projects are at risk they have no responsibility to hurry and fix a thing.
You'd think, in the spirit of open source, these multi-billion dollar companies--like Apple and Google and Amazon--would recognize the danger and immediately divert the best engineers they had to help this team identify and mitigate the problems. They should have been buried in useful pull requests.
For that matter, they should have really picked them all up in private jets and flown them to neutral working space with those engineers for a one or two week hackathon/code sprint to clean up the outstanding issues and set the project on a sustainable path. To get those maintainers there they should offer a six figure consulting fee and negotiate with their current employers to secure their temporary help.
I can't believe these folks just get abandoned like this while CEOs/CTOs from rich companies wring their hands wailing about the problems and not offering solutions.
> I'm still flabbergasted that the original maintainers are rushing around trying to patch these problems. Unless their specific personal/professional projects are at risk they have no responsibility to hurry and fix a thing.
Sorry, but what's the hard part to understand? Open source maintainers end up in this position because they are nice, helpful people who like using computers to solve problems for others. People who spend years on a project and then see a bigger problem arise don't suddenly turn that off. With the bigger problem, they'll want to work harder, not just hoist a middle finger and go binge Netflix without a care in the world.
But I totally agree with you on the CTOs, etc. I don't expect random programmers who like working on logging to also be good at solving complicated sociotechological problems around paying for global infrastructure. But it boggles my mind that none of these richly rewarded, supposedly brilliant experts at organizing engineers has gotten out in front of this. If not out of community spirit or social responsibility, then out of pure self interest.
> none of these richly rewarded, supposedly brilliant experts at organizing engineers has gotten out in front of this
Indeed. Each of them has had to spend the last few days madly trying to fix this problem to avoid exposing exposing their infrastructure. Each has been, in some way, replicating the wheel to do so. I'm curious how many will actually submit their findings to the original OSS so others can learn from their experience?
There's always resources to put a fire but rarely enough to install a sprinkler system.
> You'd think, in the spirit of open source, these multi-billion dollar companies--like Apple and Google and Amazon--would (...) mitigate the problems.
FAANG engineer here, and one who had to work extra hours to redeploy services with the log4j vulnerability fix. I'm not sure you understand the scope and constraints of this sort of problem. Log4j's maintainers have a far more difficult and challenging job than FANGs or any other consumer of a FLOSS package, who only need to consider their own personal internal constraints, and if push comes to shove can even force backwards-incompatible changes. The priority of any company, FANG or not, is to plug their own security holes ASAP. Until that's addressed the thought of diverting resources to fix someone else's security issues doesn't even register on the radar. I mean, are you willing to spend your weekend working around the clock to fix my problems? Why do you expect others like me to do that, then? Instead I'm spending a relaxing weekend with my family with the confort of knowing my service is safe. Why wouldn't I?
I'm not saying you, as an engineer for those companies, should be the one to donate your time and energy toward the problem. We all have competing priorities, as do the maintainers of those FLOSS packages.
I'm saying that your company's CTO, especially one with a very large companies, could likely identify two or three engineers who they pull into a meeting and say "reach out to these guys and get them whatever they need. Here's my cell, call me the moment you need the plane or additional resources."
Seriously, if a CTO has a budget of a few hundred million dollars and thousands of dedicated employees, how hard is it to throw a few crumbs to the open source community to change this situation from being one of a burden on a volunteer effort to, instead, one where they feel like they're in the middle of an international event where their knowledge and services are vital to keeping the internet alive?
Again, I'm exaggerating, but you see where I'm going with this. It's a missed opportunity for some seriously great PR out of a seriously bad situation.
> I mean, are you willing to spend your weekend working around the clock to fix my problems?
Surely the difference is you are getting paid, and if your boss says, help these guys out, you can do it? As opposed to some guys with jobs who have a project on the side. The big guys could even do something like offer to pay the maintainers and maybe they can take leave or something.
I agree with both sentiments. The big guys are under no obligation to fix an issue in some library they happen to use. But the log4j guys are under even less obligation when they do it in their spare time.
> You'd think, in the spirit of open source, these multi-billion dollar companies--like Apple and Google and Amazon--would (...) mitigate the problems.
Your "(...)" elides the word "help," which completely changes the meaning of the quote, and your reply is constructed uncharitably as if that word wasn't in the original statement.
Somehow, I find what you are saying here to be totally unplausible.
> Log4j's maintainers have a far more difficult and challenging job than FANGs
You are saying that the companies that built advanced ML-based Chess/Go engines like Alpha Zero/Go can't solve a simple logging bug involving string substitution?
If your company ends up using the product in all your teams/project and products wouldn't it be in the company's interest to keep the product safe?
How do we know you're not a CTO/C--/manager in your 'faang' just taking this opportunity to bitch about how bad and unreliable open source is? You do have a track record when it comes to this.
> I mean, are you willing to spend your weekend working around the clock to fix my problems?
Speaking as an individual, of course you want to sit by the pool this weekend.
But as a professional representative of your org. surely you'll recognized the unsustainability of the situation and that it's far from ideal even in the pure self-interest of the company in question.
>> I'm still flabbergasted that the original maintainers are rushing around trying to patch these problems.
Agreed, while reading it I also disagreed at this point:
>> the maintainers of log4j would have loved to remove this bad feature long ago, but could not because of the backwards compatibility promises they are held to.
Nobody is holding them to anything. If they want to remove an old feature, go right ahead. If those using it think it's that important they can fork the project and maintain it themselves. Oh right, that would take effort or money.
I don't get this argument. Part of sharing your work is making sure what you put out is actually helpful to people. If they remove features people really like, then the library won't be as helpful - so it's perfectly fine for the OG devs to maintain this feature. The same thing with "scrambling" to fix - that could be because a sword is hanging over your head, or because you care about the people who use your work. Thinking this way, I can perfectly see them working hard to fixing this bug.
I understand it perfectly. Log4j is used in many Enterprise systems. Java is a fairly conservative language. Combine both together and you get much hesitancy to break backwards compatibility ingrained in the Java world.
Are data breaches actually treated as all that seriously? For all the talk about cyber security, there seems to generally be little investment. It appears to be viewed as more of a reputational concern than an operational one.
A past organization of mine had a data breach (the kind that ended up making the news everywhere). A few people left (probably making it worse with all the turnover there), but I would be surprised if anything really changed in that organization.
If the company is in healthcare or finance, yes. Otherwise the typical answer is no. Most companies just load up on cyber insurance and call it a day. That said, reputational concern, is a big thing for companies. Take Dropbox for example. Early on they suffered several security breaches, and had a bad reputation around security. They've since built out a fairly large security program, in part because bad security can block deals, especially in the enterprise space.
I'll note that there's been more investment in security the last 4-5 years. Most B2B companies do a SOC2, and early on, so there tends to be a baseline of competence.
A data breach isn't the primary concern here. This exploit allows full pwnage of a system and could take down entire networks for as long as it takes to rebuild them.
This is not really about data breaches. The first widely spread automated attacks seem to drop cryptominers, however, we should expect that (if it's not already happened) within a week or so this will get used as the entry point for ransomware attacks, since it gives attackers a solid way of getting of code execution into company servers for anyone who has not solved this issue.
> I'm still flabbergasted that the original maintainers are rushing around trying to patch these problems.
If the RCE had been responsibly disclosed instead of via tweets and PR comments, maybe there wouldn't have had to be so much scrambling. And indeed maybe ASF could have found corporate OSPOs to help with remediation.
There are lots pixels being spilled on how the users of open source software should be paying for it (?), but I haven't seen much criticism of the vulnerability not being responsibly disclosed.
to the best of my knowledge it was discovered via a minecraft exploit and I don't think minecraft players are generally the "responsible disclosure" kinda people.
There’s no hiding something this easily exploitable. This isn’t rowhammer or spectre where you need a degree to understand it. Copy and paste this in and that’s it. It would have never survived “responsible disclosure”
I'm not sure about Amazon but Google project's zero and openfuzz teams seem to be doing a lot of good work when it comes to open-source security -- more would be nice always
Personally I'd like something like a security health card/metric on opensource libaries that we could tie into CI systems/pull requests or something
in the past there were so few libarries it wasn't as daunting
I'd be able reason about stuff like libpng, libttf ..etc and think about them or even support them but now some projects are massive hodgepodges of thousands upon thousnads of packages
I admit to a certain level of exaggeration but, at the same time, we are talking literal peanuts to a large company. They could spend a million dollars and it'd be a rounding error on their balance sheet.
In all seriousness, taking actions like I identified above would cost the companies virtually nothing but result in huge long-term benefits by signaling to the rest of the open source world that "we love your work and will be right beside you helping if the chips are down."
This is, of course, not a suitable compensation model for popular open source projects. Thats a separate conversation.
For argument’s sake, at least, I don’t consider anything suggested here as definitively “over-the-top”. It may seem (or be) unrealistic in practice (for reasons I don’t know), but the suggestion is far from unconscionable— it may, in fact, be the lowest cost solution to what could cost mega-corps billions in current (and potential future) fines/liabilities. To the extent it sounds like an exaggeration, I think that embodies the point of the comment— there are some (almost unreconcilable) concerns that impact the interplay of corporations and open source development.
As you said, the solution isn't the hard part. The reason that large companies aren't deploying their own solutions for this issue isn't that their engineers engineers that are incapable of developing their own solutions, but because then they would have to carry that patch forever, and if a problem was found with their particular solution they would be on the hook for it.
And yes, I do think this, "but everyone else is doing the same thing so it isn't really our fault" attitude is a problem.
> You'd think, in the spirit of open source, these multi-billion dollar companies--like Apple and Google and Amazon--would recognize the danger and immediately divert the best engineers they had to help this team identify and mitigate the problems.
Google doesn't even use log4j. What are you talking about? The spirit of open source does not dictate that the richest companies automatically shoulder the burden of maintenance of projects they do not even use. Google already has initiatives like Summer of Code that help open source projects it does not use, and I think it's perfectly fine to draw the line there.
> divert the best engineers they had
So the lessons from the mythical man-month are forgotten here. At this point I don't think adding more manpower helps.
What? It was already fixed. You just need to update. There's no need for the world's top fintech programmers to hack it out on a mountaintop somewhere.
Also, the reason the maintainers are rushing to fix it is: they're worried about losing "market share". Having been in open-source circles for a long time, maintainers care GREATLY about how many users they have. They just like watching their download stats go up every year. Even if it beings them no financial rewards. It's a sort of addiction.
> maintainers care GREATLY about how many users they have.
They do. Until they don't.
That inevitable day when they get yelled at in a github issue thread by a user who didn't bother reading the documentation, while staring at their kid in the living room playing video games and start wondering to themselves "why am I doing this hobby in my spare time again?"
Mild dopamine hits to affirmation-addicted programmers is not the sturdiest foundation upon which to build enterprise-grade software libraries.
An influx of pull requests is also equally difficult for open source projects.
Anything sufficiently at scale needs a set of maintainers that the commercial tech companies would then collaborate with to get the PRs going.
Otherwise if everyone's just panicking and rushing to submit PRs, they'll inundate the maintainer. There's also no guarantee that even the best engineers at these companies are intimately familiar with the project, and might introduce regressions or other vulnerabilities in the process.
Anyway I do agree companies should be working with OSS devs, but it shouldn't be rushed or knee jerk. It should be collaborative and measured.
> You'd think...these multi-billion dollar companies...would recognize the danger and immediately divert the best engineers they had to help this team identify and mitigate the problems.
For the general case, the problem is that a reporter might report the vulnerability to the open source project, then the project needs to keep it a secret while they make a fix. There isn't a great way to leverage these stakeholders. It's obviously different for something like Android that is open source, but clearly Google.
This is a problem in open source: everybody wants the fruits of labor without paying for it. The log4j vulnerability is what happens when you don't pay for it.
> I want to not spend much time upgrading a dependency
> Go compatibility promise:
>So whenever a change in behavior happens in an upstream library
You are comparing a promise from language designers to no promise from the library developers. Syntax from Oak (before Java was called Java) still compiles and works in Java 17 right now:
jshell
pub| Welcome to JShell -- Version 17
| For an introduction type: /help intro
jshell> public abstract interface I {}
| created interface I
You can still type (public abstract interface - all interfaces are abstract by default since Java 1) and it works. One of the reasons I gave up on writing desktop applications in Go was libraries were breaking compatibility with every commit. GTK+ binding was literary unusable as before gomod this would break literally, and I mean literally, every day.
Please tell me that none Go library had any breaking changes in the last 5 years and I'm using it as my default ecosystem from tomorrow.
To add some perspective, log4j has gone for 20 years with only two major versions. Assuming that they are following semantic versioning, that means they added new features/fixes in a backwards-compatible way and only broke compatibility _once_ in over two decades. That's both a testament to the stability of the library over time and a reminder that all the cruft accumulated over the years at most gets gated off through saner defaults.
This assumption isn't true, though. APIs routinely get changed in minor versions, which can make it non-trivial to upgrade large codebases that use lots of features.
That's the problem, you use log4j to log. Any 'feature' outside of that being used is wrong. Any 'feature' outside of that being implemented, is wrong.
If JNDI string interpolation is desired, write another module that does that.
I hate 'is-odd' but this is another extreme and demonstratably worse.
While deploying an ancillary change, our jvms started locking up for minutes on end. What was happening was that we were logging customer input, and the change caused it to run certain things in parallel, which ended up logging the data multiple times. Normally the extra logging didn't matter but one customer had data like "${foo} ${bar} ${baz} ...". Even when the ${foo} portion is replaced wothout modification, this triggers quadratic behavior. So we were already potentially vulnerable to the DOS but it was rare enough that we never got locked up until logging the string multiple times, which then overflowed log4js internal buffer and blocked worker threads.
You can try this yourself by just logging a string like "${}${}${}..." And in fairly short order it starts taking forever. I'm very glad the fix in 2.15 is to disable lookups by default.
I hope that in the time after I left, the security org at the big tech company I worked at and reported this to (as I thought it was - a dos vector, not the complete pwnage it actually was) forced teams to switch to nolookups. Otherwise a lot of people had a bad week forcing updates through...
> the security org at the big tech company I worked at and reported this to
I'm confused about these two statements, because I did not find any recent CVEs for log4j in the DoS category, nor related to format lookup (other than CVE-2021-44228 of course).
Perhaps I misread it, but are you basically saying that (after you reported the issue to them internally) the security team at your previous company could not successfully report a DoS vulnerability in the default configuration of a widely used (by them, at least) Apache library and make sure a CVE got assigned to track it?
If so, it would be interesting to know where the CVE/vuln-reporting chain broke, possibly to reduce the blast radius for similar future cases.
Hypothetically speaking, a CVE in March for a DoS in a problematic design/feature could have resulted in flipping the default setting earlier. Instead of chasing live RCE in the wild in December.
Of course the last part holds the solution for our backwards compatibility issue. Remove the JNDI nonsense from the default package and move it into an extension package. Whoever wants to keep it can just add that to their dependencies and continue to enjoy logging functions that sometimes also make network connections and block your program.
Nobody thought this was how log4j worked; log4j's documentation for format syntax only covers {} placeholders - the same format that slf4j has grandfathered in from log4j.
I agree this feels like a case where they got confused about their internal terminology. Log4j refers to messages with {} placeholders as 'FormattedMessages'; it refers to the log pattern syntax as 'Patterns' in code - but it seems to refer to them as 'log formats' in documentation.
Somewhere in this mess, someone hooked up the pattern capabilities into the formatting system.
SLF4J was created to replace Apache Commons Logging and Logback was created to replace Log4j 1.x. Both were created Ceki Gülcü, the original author of Log4j 1.x [1].
Logback came out in 2006. The first beta version of Log4j 2.x was only released 6 years later in 2012, and the JNDI lookup feature was added in 2.0-beta9[2] in 2013!
Obviously nobody complained when switching from Log4j 1.x to SLF4J+Logback that a feature from a completely different library (with the same name) that would be created 7 years into the future was not supported.
> Somewhere in this mess, someone hooked up the pattern capabilities into the formatting system.
That's not what happened. The lookup mechanism (which includes "${jndi:}" lookups) is completely unrelated to the message formatting subsystem.
The way formatting and pattern lookups work in log4j2 is:
1. logger.info("Hello {}", "world") creates a FormattedMessage instance with the "Hello {}" format string and a single parameter, "world".
2. The FormattedMessage is wrapped in a LogEvent and routed to the correct appender(s).
3. Most appenders will format the LogEvent with a Layout. In our case, it's PatternLayout we care about[3].
4. PatternLayout will pre-calculate a set of PatternConverters based on your pattern, so it doesn't have to keep parsing the pattern on every invocation. "%m" will map to MessagePatternConverter.
5. (grossly simplifying zero-garbage and streaming optimizations) Each pattern converter is executed and appends to the final layout text's StringBuilder.
6. (grossly simplifying oh so many things) MessagePatternConverter will first call event.getMessage().getFormattedMessage(). The logic for formatting the message is entirely encapsulated by Message and its subclasses. MessagePatternConverter has no way to distinguish the format string from the user-provided parameters!
7. MessagePatternConverter finally applies the pattern lookups to the formatted message text. The pattern lookup mechanism is completely separate from and orthogonal to the message formatting mechanism.
---
That was long-winded, but I had to fight these annoying misconception about "log4j not implemented format strings properly".
Now, there are several things I'm not saying here:
1. I don't think more than a handful of people ever relied on lookups working on the log message (formatted or otherwise), as opposed to the pattern in the configuration file.
2. I don't think Log4j should have kept compatibility here. The moment the maintainers implemented "%m{nolookups}" (on version 2.7), they should have made it the default. That being said, I know this is very hard to do in the Java ecosystem. But I think it is time that the Java developer community changes its extremist position regarding compatibility at all costs.
3. I don't think that Log4j should have implemented pattern lookups for text messages to begin with. Even if was just the format string part (which is impossible to do with Log4j's current architecture anyway).
4. I don't think any kind of string formatting should be included in a logging library. If you want to format log messages, use an external formatting function or string interpolation (if you're lucky enough to be using Kotlin or Scala). If it is added, it should only be used as a convenience, and shouldn't do anything more than formatting (like lookups). Relying on developers to always remember that log.info("Hello {}", world) is safe and log.info("Hello {}" + world) gives the entire internet full control of your server is beyond stupid. Even if Log4j went with this silly distinction, I would say it was a horrible design.
[1] https://techblog.bozho.net/the-logging-mess/
[2] https://logging.apache.org/log4j/2.x/changes-report.html#a2....
[3] It seems like PatternLayout is the only layout vulnerable to this bug in log4j2, but it is hard to tell, the implementation being a classic Java mess of deep class hierarchy, liberal use of reflection to control everything and some heroic attempts to break SOLID principles at least 4 times on a single line of code. Take my analysis with a grain of salt. It's a gross simplification of what is unfortunately par for the course in many Java libraries.*
It was actually intended behavior, and this is what really boggles the mind! Javadoc says explicitly that variable replacement is recursive, with cycle detection (which will throw! What happens to the log line in this case?) [0].
[0] https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/...
If they really dislike the feature that much, they likely dislike the code and want to completely delete it. I'm not sure if making it opt-in would make them as happy as fully deleting it, so they are less motivated to make it opt-in than they would be to fully delete it.
It's logging. While logging is extremely important, I think we could all tolerate removing a vulnerable feature. Or, just move the feature to a separate package.
I have made bad decisions, we have all made bad decisions. Own them, improve, and celebrate the opportunity to learn and improve. Keeping this around, as a default, was a bad decision. If your enterprise contracts don't want to turn a flag on, then they can always skip upgrading (they generally do regardless).
Should maintainers of all core apache libs just remove or disable features they don’t like, when not known to be insecure?
That said, log4j2 isn’t that old. Not sure why this was added in the first place. At the very least it’s a performance issue.
I'd bet more will start doing so. If nobody is excited to keep the feature up and any unloved code contains risks, getting rid of it seems fine to me. If companies want that code maintained, they can pay up or get one of their people to do it.
If noone funds their development and they maintain it for free? Then yes, why not.
Why not? I can just go into a parity package.
In a perfect world, the feature would have been an option from the start, but in that same perfect world, the downstream users would be diligent and check release notes before upgrading. You might, but many of your colleagues don’t, they just upgrade, and complain when their system breaks.
Definitely. But really, they were screwed once it had shipped. They could and should have disabled it in an update long ago, but then anyone who read the release notes or the code would know how to exploit the millions of un-updated systems.
Log4j RCE Found - https://news.ycombinator.com/item?id=29504755 - Dec 2021 (457 comments)
Widespread exploitation of critical remote code execution in Apache Log4j - https://news.ycombinator.com/item?id=29520415 - Dec 2021 (80 comments)
Job done.
Logging libraries are unnecessarily complicated.
And make it possible to change any of those knobs at runtime, without touching the code (minimum severity, set of subroutines, set of recipients, delivery method).
You'd think, in the spirit of open source, these multi-billion dollar companies--like Apple and Google and Amazon--would recognize the danger and immediately divert the best engineers they had to help this team identify and mitigate the problems. They should have been buried in useful pull requests.
For that matter, they should have really picked them all up in private jets and flown them to neutral working space with those engineers for a one or two week hackathon/code sprint to clean up the outstanding issues and set the project on a sustainable path. To get those maintainers there they should offer a six figure consulting fee and negotiate with their current employers to secure their temporary help.
I can't believe these folks just get abandoned like this while CEOs/CTOs from rich companies wring their hands wailing about the problems and not offering solutions.
Sorry, but what's the hard part to understand? Open source maintainers end up in this position because they are nice, helpful people who like using computers to solve problems for others. People who spend years on a project and then see a bigger problem arise don't suddenly turn that off. With the bigger problem, they'll want to work harder, not just hoist a middle finger and go binge Netflix without a care in the world.
But I totally agree with you on the CTOs, etc. I don't expect random programmers who like working on logging to also be good at solving complicated sociotechological problems around paying for global infrastructure. But it boggles my mind that none of these richly rewarded, supposedly brilliant experts at organizing engineers has gotten out in front of this. If not out of community spirit or social responsibility, then out of pure self interest.
Indeed. Each of them has had to spend the last few days madly trying to fix this problem to avoid exposing exposing their infrastructure. Each has been, in some way, replicating the wheel to do so. I'm curious how many will actually submit their findings to the original OSS so others can learn from their experience?
There's always resources to put a fire but rarely enough to install a sprinkler system.
There is a perfectly healthy, acceptable, middle ground between those two extremes, however.
FAANG engineer here, and one who had to work extra hours to redeploy services with the log4j vulnerability fix. I'm not sure you understand the scope and constraints of this sort of problem. Log4j's maintainers have a far more difficult and challenging job than FANGs or any other consumer of a FLOSS package, who only need to consider their own personal internal constraints, and if push comes to shove can even force backwards-incompatible changes. The priority of any company, FANG or not, is to plug their own security holes ASAP. Until that's addressed the thought of diverting resources to fix someone else's security issues doesn't even register on the radar. I mean, are you willing to spend your weekend working around the clock to fix my problems? Why do you expect others like me to do that, then? Instead I'm spending a relaxing weekend with my family with the confort of knowing my service is safe. Why wouldn't I?
I'm saying that your company's CTO, especially one with a very large companies, could likely identify two or three engineers who they pull into a meeting and say "reach out to these guys and get them whatever they need. Here's my cell, call me the moment you need the plane or additional resources."
Seriously, if a CTO has a budget of a few hundred million dollars and thousands of dedicated employees, how hard is it to throw a few crumbs to the open source community to change this situation from being one of a burden on a volunteer effort to, instead, one where they feel like they're in the middle of an international event where their knowledge and services are vital to keeping the internet alive?
Again, I'm exaggerating, but you see where I'm going with this. It's a missed opportunity for some seriously great PR out of a seriously bad situation.
Surely the difference is you are getting paid, and if your boss says, help these guys out, you can do it? As opposed to some guys with jobs who have a project on the side. The big guys could even do something like offer to pay the maintainers and maybe they can take leave or something.
I agree with both sentiments. The big guys are under no obligation to fix an issue in some library they happen to use. But the log4j guys are under even less obligation when they do it in their spare time.
Everyone should enjoy their weekends.
Your "(...)" elides the word "help," which completely changes the meaning of the quote, and your reply is constructed uncharitably as if that word wasn't in the original statement.
> Log4j's maintainers have a far more difficult and challenging job than FANGs
You are saying that the companies that built advanced ML-based Chess/Go engines like Alpha Zero/Go can't solve a simple logging bug involving string substitution?
If your company ends up using the product in all your teams/project and products wouldn't it be in the company's interest to keep the product safe?
How do we know you're not a CTO/C--/manager in your 'faang' just taking this opportunity to bitch about how bad and unreliable open source is? You do have a track record when it comes to this.
> I mean, are you willing to spend your weekend working around the clock to fix my problems?
Wow, that's cynical even for a 'faang' dude.
Deleted Comment
But as a professional representative of your org. surely you'll recognized the unsustainability of the situation and that it's far from ideal even in the pure self-interest of the company in question.
Agreed, while reading it I also disagreed at this point:
>> the maintainers of log4j would have loved to remove this bad feature long ago, but could not because of the backwards compatibility promises they are held to.
Nobody is holding them to anything. If they want to remove an old feature, go right ahead. If those using it think it's that important they can fork the project and maintain it themselves. Oh right, that would take effort or money.
I don't get this argument. Part of sharing your work is making sure what you put out is actually helpful to people. If they remove features people really like, then the library won't be as helpful - so it's perfectly fine for the OG devs to maintain this feature. The same thing with "scrambling" to fix - that could be because a sword is hanging over your head, or because you care about the people who use your work. Thinking this way, I can perfectly see them working hard to fixing this bug.
If so, I really wouldn’t hold someone to any backwards compatibility promise if security is a concern.
A past organization of mine had a data breach (the kind that ended up making the news everywhere). A few people left (probably making it worse with all the turnover there), but I would be surprised if anything really changed in that organization.
I'll note that there's been more investment in security the last 4-5 years. Most B2B companies do a SOC2, and early on, so there tends to be a baseline of competence.
Deleted Comment
If the RCE had been responsibly disclosed instead of via tweets and PR comments, maybe there wouldn't have had to be so much scrambling. And indeed maybe ASF could have found corporate OSPOs to help with remediation.
There are lots pixels being spilled on how the users of open source software should be paying for it (?), but I haven't seen much criticism of the vulnerability not being responsibly disclosed.
Deleted Comment
Personally I'd like something like a security health card/metric on opensource libaries that we could tie into CI systems/pull requests or something
in the past there were so few libarries it wasn't as daunting
I'd be able reason about stuff like libpng, libttf ..etc and think about them or even support them but now some projects are massive hodgepodges of thousands upon thousnads of packages
That ("...private jets..") doesn't happen because the solution isn't exactly the hard part, and the unpaid original maintainers are doing them anyway.
In all seriousness, taking actions like I identified above would cost the companies virtually nothing but result in huge long-term benefits by signaling to the rest of the open source world that "we love your work and will be right beside you helping if the chips are down."
This is, of course, not a suitable compensation model for popular open source projects. Thats a separate conversation.
But it would at least be something.
And yes, I do think this, "but everyone else is doing the same thing so it isn't really our fault" attitude is a problem.
Google doesn't even use log4j. What are you talking about? The spirit of open source does not dictate that the richest companies automatically shoulder the burden of maintenance of projects they do not even use. Google already has initiatives like Summer of Code that help open source projects it does not use, and I think it's perfectly fine to draw the line there.
> divert the best engineers they had
So the lessons from the mythical man-month are forgotten here. At this point I don't think adding more manpower helps.
Also, the reason the maintainers are rushing to fix it is: they're worried about losing "market share". Having been in open-source circles for a long time, maintainers care GREATLY about how many users they have. They just like watching their download stats go up every year. Even if it beings them no financial rewards. It's a sort of addiction.
They do. Until they don't.
That inevitable day when they get yelled at in a github issue thread by a user who didn't bother reading the documentation, while staring at their kid in the living room playing video games and start wondering to themselves "why am I doing this hobby in my spare time again?"
Mild dopamine hits to affirmation-addicted programmers is not the sturdiest foundation upon which to build enterprise-grade software libraries.
Anything sufficiently at scale needs a set of maintainers that the commercial tech companies would then collaborate with to get the PRs going.
Otherwise if everyone's just panicking and rushing to submit PRs, they'll inundate the maintainer. There's also no guarantee that even the best engineers at these companies are intimately familiar with the project, and might introduce regressions or other vulnerabilities in the process.
Anyway I do agree companies should be working with OSS devs, but it shouldn't be rushed or knee jerk. It should be collaborative and measured.
For the general case, the problem is that a reporter might report the vulnerability to the open source project, then the project needs to keep it a secret while they make a fix. There isn't a great way to leverage these stakeholders. It's obviously different for something like Android that is open source, but clearly Google.
Drive-by pull requests during a highly visibile emergency are rarely useful.
Maybe it makes more sense to fund system-wide efforts?
Deleted Comment
Deleted Comment
> I want to not spend much time upgrading a dependency
> Go compatibility promise:
>So whenever a change in behavior happens in an upstream library
You are comparing a promise from language designers to no promise from the library developers. Syntax from Oak (before Java was called Java) still compiles and works in Java 17 right now:
You can still type (public abstract interface - all interfaces are abstract by default since Java 1) and it works. One of the reasons I gave up on writing desktop applications in Go was libraries were breaking compatibility with every commit. GTK+ binding was literary unusable as before gomod this would break literally, and I mean literally, every day.Please tell me that none Go library had any breaking changes in the last 5 years and I'm using it as my default ecosystem from tomorrow.
That's the problem, you use log4j to log. Any 'feature' outside of that being used is wrong. Any 'feature' outside of that being implemented, is wrong.
If JNDI string interpolation is desired, write another module that does that.
I hate 'is-odd' but this is another extreme and demonstratably worse.
Changing the major version number as long it's accompanied by a well written release note on what needs to change seems fine.