Readit News logoReadit News
thegeomaster · 6 years ago
The JSON spec [1] clearly states, in section 9:

  An implementation may set limits on the maximum depth of
  nesting.
So, this is kind of a pointless exercise. The author would do well to inform themselves better before slinging epithets such as "bad" for a library.

[1]: https://tools.ietf.org/html/rfc7159

ken · 6 years ago
I would be perfectly OK with that answer if they documented that they use the C stack and therefore have a small finite depth limit, and/or threw some sort of TooMuchNesting exception before they hit that limit.

AFAICT, many/most of these libraries do neither of the above. There's nothing in the documentation to suggest they'll have trouble with deeply nested inputs, and their response to such a situation is to simply crash the process. I'd call that a bug.

I don't consider it acceptable for a library to simply say "the spec says we're technically allowed to 'set limits' on what we accept, so here's some nasal demons". If you've got limits, you've got to document them, or at least make them recoverable.

asveikau · 6 years ago
It's hard for a library to portably know when the C stack will exhaust or give a concrete estimate that makes sense for documentation purposes.

Firstly, that concept is not exposed at all to standard complaint C code. The standard is vague enough to allow for many possible variations of the program stack abstraction. So you would not be portable, standard C.

Second, the library author doesn't know ahead of time how much stack space you, the caller, have consumed before calling the library. Do you call it deep inside some ui framework with a big stack? With a huge i/o buffer on the stack? They don't know.

Translating all of this into "this is how deeply nested your json can be" is not a sane exercise, at best you can give a machine-specific guess that will vary a bit with your own application's stack needs.

But perhaps one can side-step all of that and get most of the way with an artificial, arbitrary-constant runtime depth check that is safely below likely actual stack limits.

taneq · 6 years ago
> I don't consider it acceptable for a library to simply say "the spec says we're technically allowed to 'set limits' on what we accept, so here's some nasal demons".

Welcome to C, enjoy your stay.

Deleted Comment

nabla9 · 6 years ago
I just wrote standard conforming JSON parser:

    cat standard-still-bad.sh

    #!/bin/sh
    echo "ERROR: JSON parser implementation limit." 1>&2
    exit 1 # terminate and indicate error

--- https://tools.ietf.org/html/rfc7159#section-9

> An implementation may set limits on the size of texts that it accepts. An implementation may set limits on the maximum depth of nesting. An implementation may set limits on the range and precision of numbers. An implementation may set limits on the length and character contents of strings.

otterley · 6 years ago
"Conforming" is not identical to "useful." A car that doesn't start is still a car.
Svip · 6 years ago
It's funny, the spec they references[0] makes no mention of parsers or nesting. But RFC 8259[1] clearly does. Moreover, I notice they were both published in December 2017. Why two specs for the same standard?

[0] http://www.ecma-international.org/publications/files/ECMA-ST...

[1] https://tools.ietf.org/html/rfc8259

gsnedders · 6 years ago
As is almost always the case when there are multiple specs purporting to describe the same thing: politics.

The Ecma spec is meant to be a spec defining a format and its semantics; the RFC places requirements on implementations of that format as well as describing it's semantics.

snek · 6 years ago
There was a rumor that was IETF was going to specify a bunch of new breaking changes in RFC 7195 (March 2014), and because of that (irrational) fear ECMA published the 1st edition of 404 very quickly in October of 2013.

RFC 8259 and the 2nd edition of ECMA-404 contain the same change, a single sentence, specifying that JSON must use UTF-8.

Deleted Comment

skybrian · 6 years ago
Maybe the headline is bad, but making a table of implemention limits of various JSON parsers seems useful?

Maybe we focus on headlines too much?

webkike · 6 years ago
I dunno, the author claims they're sorting the table from worst to best, instead of by minimum maximum depth to maximum maximum depth
lovasoa · 6 years ago
Maybe I shouldn't have called this repository "bad json parsers". I just added the following to the README:

> The name of this repository is intentionally provocative. It's goal is to document limitations of existing json parsers, not to denigrate the work of the many contributors who worked on the various libraries presented here.

Igelau · 6 years ago
The word you're looking for is clickbait ;)

"We Ranked The Top JSON Parsers By Call Stack Depth... Number 6 Will Shock You!"

bb88 · 6 years ago
First appreciate the change to the readme. I think that goes a long way.

Second, "bad" here is highly subjective. I've never ever hit the recursive depth limit in python with JSON, and if I do, I'll probably think about doing it a different way, in a way that won't require python to do deep recursion.

elpocko · 6 years ago
Its*. "It is goal" makes no sense.
paulddraper · 6 years ago
Yeah, this is dumb.*

I'd sooner criticize a parser for mangling numbers above 2^53 (also allowed by the spec) than not allowing thousands of nested arrays.

---

* Except that this is potential DoS vector. From that angle, it has some interest.

Svip · 6 years ago
> I'd sooner criticize a parser for mangling numbers above 2^53 (also allowed by the spec)

The spec has something specifically to say about this[0]:

> Note that when such software is used, numbers that are integers and are in the range [-(253)+1, (253)-1] are interoperable in the sense that implementations will agree exactly on their numeric values.

Basically, allow up to 2^53, and you ought to be fine.

[0] https://tools.ietf.org/html/rfc8259#section-6

andrepd · 6 years ago
From TFA

>The two most recent JSON standards RFC 8259 and RFC 7159 both say "An implementation may set limits on the maximum depth of nesting". However, the ECMA-404 specification doesn't contain any limit on how deeply nested JSON structures can be.

ddlatham · 6 years ago
To be fair, that text was added after the above comment was posted.

https://github.com/lovasoa/bad_json_parsers/commit/16f6ed75b...

mantap · 6 years ago
Not stating a limit doesn't mean there's NO limit. It means the limit is implementation defined. If the way you interpret a spec implies a requirement of infinite memory or infinite stack, then the way you interpret it is wrong.
vesinisa · 6 years ago
Nevertheless, the parser failing at only 100 levels of nesting is shockingly bad.
femto113 · 6 years ago
I work with complex JSON every day. Even painfully complex structures like Facebook's Ad Insights API or AWS Lambda Events only get around 5 levels deep, so 10 deep is already exotic, and I can't even conceive of a use case for 100. The method of failure is important too--would it really be better to just keep going forever, or die when maximum stack depth was reached, or the machine runs out of RAM (or swap space...)? Treating absurdly nested JSON the same way you would any invalid JSON input seems like a much friendlier and less dangerous approach overall.
ryanianian · 6 years ago
Lots of recursive data-structures (trees/graphs/*) lend themselves nicely to lots of nesting. Such documents aren't really intended to be read/written by humans.
corprew · 6 years ago
This works out well in practice -- the ruby parser that ships with the language (the 100 levels one) works for most purposes and will even go back to an all-ruby implementation if it can't load its C extension. It's a reasonable default parser for almost all cases.

However, if you need the all-singing, all-dancing, handles the most complex use cases super fast parser, which is Oj at the bottom (which handles infinite levels), then you have the option to install it and use it.

bryanrasmussen · 6 years ago
Bad is a value statement, they didn't say not to spec, invalid, or similar.
saagarjha · 6 years ago
I think you’re missing a link.
jrockway · 6 years ago
I don't like it when programs use physical limits to trigger physical errors based on the input. "JSON nesting depth of 23456 levels exceeds maximum memory of 23456MB" is a very different thing from being kill -9'd by the OOM killer (after of course it gets your ssh session).

If you decide your limit in advance and bail out of parsing, that's one thing. If you just use up all the stack and cross your fingers, that's a pain.

thechao · 6 years ago
I know this is too late to be useful, but it only takes 1b per nesting level to track the nesting depth in O(1) time, or just a 64b integer to track the nesting depth in O(n) time. I wonder if there a sublinear algorithm? At best to me, right now, it could made to look like bitstream compression.
bspammer · 6 years ago
The author has now added this as an aside: https://github.com/lovasoa/bad_json_parsers/commit/16f6ed75b...
blablabla123 · 6 years ago
It is pointless, also nobody ever cares much how well a text editor can handle obscure text files (file too large, broken encoding, ....) What matters is how it handles well-formatted files, the others aren't worth opening in an editor anyways :)
mike_hock · 6 years ago
The commenter would do well to RTFA before slinging complaints that have already been addressed in the article.

The article is cool-headed and factual, so don't get hung up on the repo name.

ddlatham · 6 years ago
To be fair, the article was updated to address those complaints after the comment you're responding to was made.

https://github.com/lovasoa/bad_json_parsers/commit/16f6ed75b...

gameswithgo · 6 years ago
It is better to not set limits.
skybrian · 6 years ago
There is always a limit. The question is whether you want to use up all available memory reading bad input, or reject it sooner. If you're worried about denial-of-service attacks, it makes sense to be explicit about how big an input you'll accept.
vesinisa · 6 years ago
It would've been interesting to know which platform was the JavaScript test performed on (judging by source code, likely Node.js). However, on my current machine Chrome chokes to stack overflow above nesting level 125,778.

More amazingly, Firefox is currently happily parsing a JSON array with 50,000,000 levels of nesting at full CPU single core utilization and consuming about 12 GB of memory. This was after an array of depth 10,000,000 was successfully parsed after peaking around ~8 GB of RAM. So Firefox's parser seems to be only constrained by available memory!

Also - while this is running in the console the Firefox UI is completely usable and I can browse the web in another tab. Mozilla has come a long way!

E: The 50,000,000 nesting level array parsing eventually failed with "Error: out of memory".

anoncake · 6 years ago
Great. Is anything else usable too?
vesinisa · 6 years ago
Pretty much, but RAM utilization is of course closing at 100% so starting any new programs would fail on OS level.
stickfigure · 6 years ago
My takeaway from this is that you can submit a long string of brackets to REST apps built with Haskell and Ruby/Oj and crash the process and/or machine by consuming all available RAM. Whereas the smarter JSON processors near the top of the list will abort processing pathological inputs before serious harm occurs.

Wait, which JSON parsers are supposed to be "bad"?

Spivak · 6 years ago
Just because a structure is deeply nested doesn’t mean that it consumes a lot of memory.

And just because your payload is massive and consumes a lot of memory doesn’t mean it’s deeply nested.

I doubt any of the parsers are resilient against storing large documents in memory. What makes them bad is that structures that would easily fit in memory are unparsable.

In most cases your web server imposed a limit on the size of the HTTP payload so you wouldn’t normally be able to deliver a 128GiB JSON file anyway. It’s not like nested structures are zip-bombs.

stickfigure · 6 years ago
My comment was mostly tongue-in-cheek but there's some truth to it. Since the receiver creates a navigable data structure with 64-bit pointers, every two bytes sent ("[]") becomes 20+ bytes in RAM (low estimate!). An order of magnitude magnification helps.
lovasoa · 6 years ago
Well, most JSON parsers (haskell being a notable exception) use an amount of memory that is proportional to the size of the input, independently of whether they can parse deeply nested structures or not.

Saying that a parser that does not limit the nesting depth will crash your server does not make any sense. You could as well say that str_replace will crash your server.

vbezhenar · 6 years ago
You can do exactly the same by submitting endless string.
rurban · 6 years ago
The problem is not DOS by memory exhaustion, the problem is stack-overflow. You can trivially control the return address of the decoder. On mostly public interfaces!

And I think I was the very first one whose JSON library knew about that, probed for the max depth (which is system dependent) and added hard and soft limits to it. It's not even mentioned in this post.

NobodyNada · 6 years ago
> You can trivially control the return address of the decoder.

With a stack overflow? Stack overflows usually write to unmapped memory and cause a segmentation fault; how would you use that to control the return address? Or perhaps you were thinking about a stack buffer overflow instead?

https://en.wikipedia.org/wiki/Stack_overflow vs https://en.wikipedia.org/wiki/Stack_buffer_overflow

abrodersen · 6 years ago
"The tar archive spec does not specify an upper limit on archive size, but my kernel keeps saying 'no space left on device'. It's obviously a bad kernel."
lovasoa · 6 years ago
Well, in your example, there is a physical limit on how much space is available. In the case of deeply nested json, we are talking about structures that perfectly fit into memory, and could be decoded in a fraction of a second if only a different algorithm were used.
LukeShu · 6 years ago
On one hand, yes. On the other hand, the standard Ruby "json" gem can be made to overflow on just 202 bytes of valid JSON input.
jrochkind1 · 6 years ago
I mean, it'd be a pathological 202 byte JSON, but yeah (I guess it could be a DOS attack of some kind, actually, hmm...)

Ruby is a good example, because the `oj` gem, on presumably the same system, is listed as "infinite." Obviously not truly "infinite", eventually it'll run out of RAM -- but this shows it is _not_ an issue of machine resources really.

As the OP notes, it's because if you implement using recursion, you'll run out of stack in most languages (now let's start talking about tail call optimization...), but this isn't the only implementation choice, you _can_ implement to not have this limitation, and probably should.

loktarogar · 6 years ago
Nope, it stops itself at 100 levels with a JSON::NestingError.

If you run it as JSON.parse(ARGF.read, max_nesting: false) you get 65492 instead of 101.

seppel · 6 years ago
> the standard Ruby "json" gem can be made to overflow on just 202 bytes of valid JSON input.

It is not overflowing, but about aborting with with a proper error.

You can argue whether 100 is a reasonable default, but I think it is not too stupid to have a maximum depth here and bail out as soon as possible. Because what will happen if you accept arbitrary nesting? Then next guy in the chain who actually works with the parsed json tree will also have to handle arbitrary nesting. And if you are now not careful, you have some error deep in some quick and dirty user written code (which might actually be exploitable instead of not accepting the json in the first place).

I would say you think you can of the 100 as some kind of input sanitation.

mperham · 6 years ago

    max_nesting: The maximum depth of nesting allowed in the parsed data structures.
    Disable depth checking with :max_nesting => false. It defaults to 100.

zamadatix · 6 years ago
Referring to it in bytes purely serves to spread FUD. It's >100 levels of nesting which is a bit silly.
oneeyedpigeon · 6 years ago
Why 202 bytes? Wouldn’t 101 [ characters suffice?
jandrese · 6 years ago
The JSON:XS documentation has this to say about the $max_depth field setting for the parser:

    Second, you need to avoid resource-starving attacks. That means you should
    limit the size of JSON texts you accept, or make sure then when your
    resources run out, that's just fine (e.g. by using a separate process that
    can crash safely). The size of a JSON text in octets or characters is
    usually a good indication of the size of the resources required to decode
    it into a Perl structure. While JSON::XS can check the size of the JSON
    text, it might be too late when you already have it in memory, so you
    might want to check the size before you accept the string.

    Third, JSON::XS recurses using the C stack when decoding objects and
    arrays. The C stack is a limited resource: for instance, on my amd64
    machine with 8MB of stack size I can decode around 180k nested arrays but
    only 14k nested JSON objects (due to perl itself recursing deeply on croak
    to free the temporary). If that is exceeded, the program crashes. To be
    conservative, the default nesting limit is set to 512. If your process has
    a smaller stack, you should adjust this setting accordingly with the
    "max_depth" method.
So accepting an unlimited depth structure by default is probably not a good idea. There's a flag you can set if you need to parse something like that, but the defaults are set to parse anything sensible and reject documents that may be malicious.

lovasoa · 6 years ago
My takeaway would be "using the C stack to parse JSON is probably not a good idea", not "accepting an unlimited depth structure by default is probably not a good idea".

This limit is entirely artificial, and due only to the design of the parser.

As the documentation you quote says, you should limit the size of JSON texts you accept. If you don't, then you are at risk of a memory exhaustion attack. But this is completely unrelated to the maximum depth you accept. If your parser doesn't use the C stack to parse recursive structures, then deeply nested payloads will parse just fine, without using an abnormal amount of memory.

chrismorgan · 6 years ago
I disagree with your conclusion. I would instead say that the combination of using the stack to parse JSON and allowing unlimited depth is the problem. In practice, almost no one ever deals with JSON structures even a hundred levels deep (I’d quite seriously guess it to be less than one in a million applications that desire it), so if you artificially cap it to protect against stack overflow you can then safely use the stack, thereby getting better performance by reducing the required allocations, while also protecting against one DoS attack vector.
jlokier · 6 years ago
> As the documentation you quote says, you should limit the size of JSON texts you accept. [...] But this is completely unrelated to the maximum depth you accept.

I'm not sure that's right.

The documentation points out that a deep structure returned from the parser can crash the interpreter, because freeing data also uses recursion. ("due to perl itself recursing deeply on croak to free the temporary".)

Anyway, I found the 512 depth limit too much... When using coroutines in Perl, the stack is smaller and it crashes trying just to print a JSON object with depth 512. I hit the limit at about depth 150, with nothing else on the C stack, so in robust diagnostic code that uses the JSON formatter to log data structures, I limit the max depth to 20.

zenhack · 6 years ago
I'd argue this is a pitfall that languages can and should solve. There's no reason why writing a naive recursive descent parser should cause this problem. The fix, using a separate stack data structure, is telling, since it's the same algorithm, with the same space complexity, you're just saying "my language's call stack isn't a usable implementation, so I need a different stack." Why not just fix the call stack?

In languages like rust/c/c++, this might be hard, since you can't grow the stack by moving them (since you'd need to update pointers, which are too visible in those languages, and even identifying a pointer at runtime is undecidable), and segmented stacks cause the "hot-split" problem.

But most languages could just make this problem go away by growing the call stack when needed. A few of them (incl. Haskell) do. I'm following suit in a language I'm working on.

anaphor · 6 years ago
Most purely functional languages (like Haskell) don't really have a "call stack", but the computation is carried out by graph reduction, so it can obviously detect cycles and do some optimizations there, depending on the type of recursion.

If you want the gory details https://www.microsoft.com/en-us/research/wp-content/uploads/...

zenhack · 6 years ago
Yeah, I've read the STG paper, though its been a while. IIRC there isn't a call stack as we'd normally think of it, but there's still a stack of pending pattern matches, so you actually do hit the same issues. I vaguely recall making deep recursion "just work" in GHC was something that happened just in the past couple years.
jlokier · 6 years ago
> In languages like rust/c/c++

In another HN thread there is much discussion about Rust using, in effect, segmented stacks to implement async-await.

Perhaps you could code a recursive descent parser in Rust using async functions, so that the Rust compiler converts it to a heap-allocated state machine for you.

jwilk · 6 years ago
bloody-crow · 6 years ago
The repo is calling it bad, but in production I'd actually prefer a parser that throws an approptiate error based on a setting that can be configured with some reasonable defaults over something that just silently chugging along until it runs out of memory.

I can imagine it'd be a lot easier to overwhelm and DDoS a server that attempts to parse incoming JSON requests without any depth bounds too.

Spivak · 6 years ago
Just because your data structure is deeply nested doesn’t mean it takes a lot of memory.

[[[[]]]] is still only 4 objects worth of memory on a stack.

lovasoa · 6 years ago
Most JSON parsers use an amount of memory that is proportional to the size of the input, independently of whether they can parse deeply nested structures or not.

A parser that limits the maximum depth of the input can still be made to consume gigabytes of memory on an input of several gigabytes, and there is nothing wrong about that.

Size limits on a production server should be enforced before even starting to parse the JSON, but this has nothing to do with the issue highlighted here.