Bad JSON Parsers - Readit News

The JSON spec [1] clearly states, in section 9:

  An implementation may set limits on the maximum depth of
  nesting.

So, this is kind of a pointless exercise. The author would do well to inform themselves better before slinging epithets such as "bad" for a library.

[1]: https://tools.ietf.org/html/rfc7159

ken · 6 years ago

I would be perfectly OK with that answer if they documented that they use the C stack and therefore have a small finite depth limit, and/or threw some sort of TooMuchNesting exception before they hit that limit.

AFAICT, many/most of these libraries do neither of the above. There's nothing in the documentation to suggest they'll have trouble with deeply nested inputs, and their response to such a situation is to simply crash the process. I'd call that a bug.

I don't consider it acceptable for a library to simply say "the spec says we're technically allowed to 'set limits' on what we accept, so here's some nasal demons". If you've got limits, you've got to document them, or at least make them recoverable.

asveikau · 6 years ago

It's hard for a library to portably know when the C stack will exhaust or give a concrete estimate that makes sense for documentation purposes.

Firstly, that concept is not exposed at all to standard complaint C code. The standard is vague enough to allow for many possible variations of the program stack abstraction. So you would not be portable, standard C.

Second, the library author doesn't know ahead of time how much stack space you, the caller, have consumed before calling the library. Do you call it deep inside some ui framework with a big stack? With a huge i/o buffer on the stack? They don't know.

Translating all of this into "this is how deeply nested your json can be" is not a sane exercise, at best you can give a machine-specific guess that will vary a bit with your own application's stack needs.

But perhaps one can side-step all of that and get most of the way with an artificial, arbitrary-constant runtime depth check that is safely below likely actual stack limits.

taneq · 6 years ago

> I don't consider it acceptable for a library to simply say "the spec says we're technically allowed to 'set limits' on what we accept, so here's some nasal demons".

Welcome to C, enjoy your stay.

Deleted Comment

nabla9 · 6 years ago

I just wrote standard conforming JSON parser:

    cat standard-still-bad.sh

    #!/bin/sh
    echo "ERROR: JSON parser implementation limit." 1>&2
    exit 1 # terminate and indicate error

--- https://tools.ietf.org/html/rfc7159#section-9

> An implementation may set limits on the size of texts that it accepts. An implementation may set limits on the maximum depth of nesting. An implementation may set limits on the range and precision of numbers. An implementation may set limits on the length and character contents of strings.

otterley · 6 years ago

"Conforming" is not identical to "useful." A car that doesn't start is still a car.

Svip · 6 years ago

It's funny, the spec they references[0] makes no mention of parsers or nesting. But RFC 8259[1] clearly does. Moreover, I notice they were both published in December 2017. Why two specs for the same standard?

[0] http://www.ecma-international.org/publications/files/ECMA-ST...

[1] https://tools.ietf.org/html/rfc8259

gsnedders · 6 years ago

As is almost always the case when there are multiple specs purporting to describe the same thing: politics.

The Ecma spec is meant to be a spec defining a format and its semantics; the RFC places requirements on implementations of that format as well as describing it's semantics.

snek · 6 years ago

There was a rumor that was IETF was going to specify a bunch of new breaking changes in RFC 7195 (March 2014), and because of that (irrational) fear ECMA published the 1st edition of 404 very quickly in October of 2013.

RFC 8259 and the 2nd edition of ECMA-404 contain the same change, a single sentence, specifying that JSON must use UTF-8.

Deleted Comment

skybrian · 6 years ago

Maybe the headline is bad, but making a table of implemention limits of various JSON parsers seems useful?

Maybe we focus on headlines too much?

webkike · 6 years ago

I dunno, the author claims they're sorting the table from worst to best, instead of by minimum maximum depth to maximum maximum depth

lovasoa · 6 years ago

Maybe I shouldn't have called this repository "bad json parsers". I just added the following to the README:

> The name of this repository is intentionally provocative. It's goal is to document limitations of existing json parsers, not to denigrate the work of the many contributors who worked on the various libraries presented here.

Igelau · 6 years ago

The word you're looking for is clickbait ;)

"We Ranked The Top JSON Parsers By Call Stack Depth... Number 6 Will Shock You!"

bb88 · 6 years ago

First appreciate the change to the readme. I think that goes a long way.

Second, "bad" here is highly subjective. I've never ever hit the recursive depth limit in python with JSON, and if I do, I'll probably think about doing it a different way, in a way that won't require python to do deep recursion.

elpocko · 6 years ago

Its*. "It is goal" makes no sense.

paulddraper · 6 years ago

Yeah, this is dumb.*

I'd sooner criticize a parser for mangling numbers above 2^53 (also allowed by the spec) than not allowing thousands of nested arrays.

---

* Except that this is potential DoS vector. From that angle, it has some interest.

Svip · 6 years ago

> I'd sooner criticize a parser for mangling numbers above 2^53 (also allowed by the spec)

The spec has something specifically to say about this[0]:

> Note that when such software is used, numbers that are integers and are in the range [-(253)+1, (253)-1] are interoperable in the sense that implementations will agree exactly on their numeric values.

Basically, allow up to 2^53, and you ought to be fine.

[0] https://tools.ietf.org/html/rfc8259#section-6

andrepd · 6 years ago

From TFA

>The two most recent JSON standards RFC 8259 and RFC 7159 both say "An implementation may set limits on the maximum depth of nesting". However, the ECMA-404 specification doesn't contain any limit on how deeply nested JSON structures can be.

ddlatham · 6 years ago

To be fair, that text was added after the above comment was posted.

https://github.com/lovasoa/bad_json_parsers/commit/16f6ed75b...

mantap · 6 years ago

Not stating a limit doesn't mean there's NO limit. It means the limit is implementation defined. If the way you interpret a spec implies a requirement of infinite memory or infinite stack, then the way you interpret it is wrong.

vesinisa · 6 years ago

Nevertheless, the parser failing at only 100 levels of nesting is shockingly bad.

femto113 · 6 years ago

I work with complex JSON every day. Even painfully complex structures like Facebook's Ad Insights API or AWS Lambda Events only get around 5 levels deep, so 10 deep is already exotic, and I can't even conceive of a use case for 100. The method of failure is important too--would it really be better to just keep going forever, or die when maximum stack depth was reached, or the machine runs out of RAM (or swap space...)? Treating absurdly nested JSON the same way you would any invalid JSON input seems like a much friendlier and less dangerous approach overall.

ryanianian · 6 years ago

Lots of recursive data-structures (trees/graphs/*) lend themselves nicely to lots of nesting. Such documents aren't really intended to be read/written by humans.

corprew · 6 years ago

This works out well in practice -- the ruby parser that ships with the language (the 100 levels one) works for most purposes and will even go back to an all-ruby implementation if it can't load its C extension. It's a reasonable default parser for almost all cases.

However, if you need the all-singing, all-dancing, handles the most complex use cases super fast parser, which is Oj at the bottom (which handles infinite levels), then you have the option to install it and use it.

bryanrasmussen · 6 years ago

Bad is a value statement, they didn't say not to spec, invalid, or similar.

saagarjha · 6 years ago

I think you’re missing a link.

jrockway · 6 years ago

I don't like it when programs use physical limits to trigger physical errors based on the input. "JSON nesting depth of 23456 levels exceeds maximum memory of 23456MB" is a very different thing from being kill -9'd by the OOM killer (after of course it gets your ssh session).

If you decide your limit in advance and bail out of parsing, that's one thing. If you just use up all the stack and cross your fingers, that's a pain.

thechao · 6 years ago

I know this is too late to be useful, but it only takes 1b per nesting level to track the nesting depth in O(1) time, or just a 64b integer to track the nesting depth in O(n) time. I wonder if there a sublinear algorithm? At best to me, right now, it could made to look like bitstream compression.

bspammer · 6 years ago

The author has now added this as an aside: https://github.com/lovasoa/bad_json_parsers/commit/16f6ed75b...

blablabla123 · 6 years ago

It is pointless, also nobody ever cares much how well a text editor can handle obscure text files (file too large, broken encoding, ....) What matters is how it handles well-formatted files, the others aren't worth opening in an editor anyways :)

mike_hock · 6 years ago

The commenter would do well to RTFA before slinging complaints that have already been addressed in the article.

The article is cool-headed and factual, so don't get hung up on the repo name.

ddlatham · 6 years ago

To be fair, the article was updated to address those complaints after the comment you're responding to was made.

https://github.com/lovasoa/bad_json_parsers/commit/16f6ed75b...

stevula · 6 years ago

https://tools.ietf.org/html/rfc7159#section-9

gameswithgo · 6 years ago

It is better to not set limits.

skybrian · 6 years ago

There is always a limit. The question is whether you want to use up all available memory reading bad input, or reject it sooner. If you're worried about denial-of-service attacks, it makes sense to be explicit about how big an input you'll accept.

"The tar archive spec does not specify an upper limit on archive size, but my kernel keeps saying 'no space left on device'. It's obviously a bad kernel."

lovasoa · 6 years ago

Well, in your example, there is a physical limit on how much space is available. In the case of deeply nested json, we are talking about structures that perfectly fit into memory, and could be decoded in a fraction of a second if only a different algorithm were used.

LukeShu · 6 years ago

On one hand, yes. On the other hand, the standard Ruby "json" gem can be made to overflow on just 202 bytes of valid JSON input.

jrochkind1 · 6 years ago

I mean, it'd be a pathological 202 byte JSON, but yeah (I guess it could be a DOS attack of some kind, actually, hmm...)

Ruby is a good example, because the `oj` gem, on presumably the same system, is listed as "infinite." Obviously not truly "infinite", eventually it'll run out of RAM -- but this shows it is _not_ an issue of machine resources really.

As the OP notes, it's because if you implement using recursion, you'll run out of stack in most languages (now let's start talking about tail call optimization...), but this isn't the only implementation choice, you _can_ implement to not have this limitation, and probably should.

loktarogar · 6 years ago

Nope, it stops itself at 100 levels with a JSON::NestingError.

If you run it as JSON.parse(ARGF.read, max_nesting: false) you get 65492 instead of 101.

seppel · 6 years ago

> the standard Ruby "json" gem can be made to overflow on just 202 bytes of valid JSON input.

It is not overflowing, but about aborting with with a proper error.

You can argue whether 100 is a reasonable default, but I think it is not too stupid to have a maximum depth here and bail out as soon as possible. Because what will happen if you accept arbitrary nesting? Then next guy in the chain who actually works with the parsed json tree will also have to handle arbitrary nesting. And if you are now not careful, you have some error deep in some quick and dirty user written code (which might actually be exploitable instead of not accepting the json in the first place).

I would say you think you can of the 100 as some kind of input sanitation.

mperham · 6 years ago

    max_nesting: The maximum depth of nesting allowed in the parsed data structures.
    Disable depth checking with :max_nesting => false. It defaults to 100.

zamadatix · 6 years ago

Referring to it in bytes purely serves to spread FUD. It's >100 levels of nesting which is a bit silly.

oneeyedpigeon · 6 years ago

Why 202 bytes? Wouldn’t 101 [ characters suffice?

Second, you need to avoid resource-starving attacks. That means you should limit the size of JSON texts you accept, or make sure then when your resources run out, that's just fine (e.g. by using a separate process that can crash safely). The size of a JSON text in octets or characters is usually a good indication of the size of the resources required to decode it into a Perl structure. While JSON::XS can check the size of the JSON text, it might be too late when you already have it in memory, so you might want to check the size before you accept the string. Third, JSON::XS recurses using the C stack when decoding objects and arrays. The C stack is a limited resource: for instance, on my amd64 machine with 8MB of stack size I can decode around 180k nested arrays but only 14k nested JSON objects (due to perl itself recursing deeply on croak to free the temporary). If that is exceeded, the program crashes. To be conservative, the default nesting limit is set to 512. If your process has a smaller stack, you should adjust this setting accordingly with the "max_depth" method.