What I love about this author's work is that they're usually single-file libraries in ANSI C or Lua with focused scope, easy-to-use interface, and good documentation. And free software license. Aside from the posted project, some I like are:
- log.c - A simple logging library implemented in C99
- microui - A tiny immediate-mode UI library
- fe - A tiny, embeddable language implemented in ANSI C
- microtar - A lightweight tar library written in ANSI C
- cembed - A small utility for embedding files in a C header
- ini - A tiny ANSI C library for loading .ini config files
- json.lua - A lightweight JSON library for Lua
- lite - A lightweight text editor written in Lua
- cmixer - Portable ANSI C audio mixer for games
- uuid4 - A tiny C library for generating uuid4 strings
I vendor in log.c all the time for C projects! I had no idea the author was relatively prolific. Would really recommend checking out log.c, it's really easy to hack in what you need to.
Speaking of, I personally use https://zolk3ri.name/cgit/libzklog/about/ because I like the way it looks. :D I used his simple logging library in Go, so might as well.
I used "lite" (text editor in Lua) which has been mentioned under this submission. It is cool, too.
Oh yeah, I used their Lume library back when I did games in LOVE2D. I actually ran into them a couple times in the IRC chat (and told them one of their ideas was bad, sorry about that rxi, I checked and it's actually a good idea lol)
‘Free software’ and ‘open source software’ (as respectively defined by the FSF [1] and the OSI [2], which is how they’re usually used in practice) have overlapping definitions. The project in question is released into the public domain via the Unlicense, which qualifies as a free software ‘licence’. Many of the other projects use the MIT/Expat licence, which also qualifies as a free software licence.
Aside from the posted library sj.h which is in public domain (compatible with the definition of "free software"), the author's other projects mostly use the MIT license.
The MIT license upholds the four essential freedoms of free software: the right to run, copy, distribute, study, change and improve the software.
It is listed under "Expat License" in the list of GPL-compatible Free Software licenses.
"Source Available" and "Open Source" (with an OSI-approved license) are the terms you're looking for. "Free as in speech, or free as in beer?" is your rallying cry.
You're not aware of the simplistic, single header C library culture that some developers like to partake in. Tsoding (a streamer) is a prime example of someone who likes developing/using these types of libraries. They acknowledge that these things aren't focused on "security" or "features" and that's okay. Not everything is a super serious business project exposed to thousands of paying customers.
Strongly disagree here because JSON can come from untrusted sources and this has security implications. It's not the same kind of problem that the bloat article discusses where you just have bad contracts on interfaces.
The problem in the present case is that the caller is not made aware of the limitation, so can’t be expected to prevent passing unsupported input, and has no way to handle the overflow case after the fact.
There is no easy way out when you're working with C: either you handle all possible UB cases with exhaustive checks, or you move on to another language.
> Sometimes, it's just not the responsibility of the library.
Sometimes. In this case, where the library is a parser that is written in C. I think it is reasonable to expect the library to handle all possible inputs. Even corner cases like this which are unlikely to be encountered in common practice. This is not "bloat" it is correctness.
In C, this kind of bug is capable of being exploited. Sure, many users of this lib won't be using it in exposed cases, but sooner or later the lib will end up in some widely-used internet-facing codebase.
As others have said, the fix could be as simple as bailing once the input size exceeds 1GB. Or it could be fine-grained. Either-way the fix would not "bloat" the codebase.
And yes, I'm well aware of the single-file C library movement. I am a fan.
I wouldn’t rate those as very serious issues for this project. They’ll only be triggered if there are over MAX_INT lines or depth levels in the input. Yes, an attacker might be able to do that, but you’d have to put that input in a memory buffer to call this code. On many smaller systems, that will OOM.
Skimming the code, they also are loose in parsing incorrect json, it seems:
static bool sj__is_number_cont(char c) {
return (c >= '0' && c <= '9')
|| c == 'e' || c == 'E' || c == '.' || c == '-' || c == '+';
}
case '-': case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
res.type = SJ_NUMBER;
while (r->cur != r->end && sj__is_number_cont(*r->cur)) { r->cur++; }
break;
that seems to imply it treats “00.-E.e-8..7-E7E12” as a valid json number.
case '}': case ']':
res.type = SJ_END;
if (--r->depth < 0) {
r->error = (*r->cur == '}') ? "stray '}'" : "stray ']'";
goto top;
}
r->cur++;
break;
I think that means the code finds [1,2} a valid array and {"foo": 42] a valid struct (maybe, it even is happy with [1,2,"foo":42})
Those, to me, seem a more likely attack vector. The example code, for example, calls atoi on something parsed by the first piece of code.
⇒ I only would use this for parsing json config files.
Being tiny is one thing, but the json grammar isn’t that complex. They could easily do a better job at this without adding zillions of lines of code.
Crashing without a proper error message, leaving the user wondering what happened, is a table stake in C projects, of course. How do you intend to determine the cause of your crashes and write a meaningful error message for the user, in case of too long input when you don't check overflow?
If you are nesting 2 Billion times in a row ( at minimum this means repeat { 2 billion times followed by a value before } another 2 billion times. You have messed up.
You have 4GB of "padding"...at minimum.
You file is going to be Petabytes in size for this to make any sense.
You are using a terrible format for whatever you are doing.
You are going to need a completely custom parser because nothing will fit in memory. I don't care how much RAM you have.
Simply accessing an element means traversing a nested object 2 billion times in probably any parser in the world is going to take somewhere between minutes and weeks per access.
All that is going to happen in this program is a crash.
I appreciate that people want to have some pointless if(depth > 0) check everywhere, but if your depth is anywhere north of million in any real world program, something messed up a long long time ago, never mind waiting until it hits 2 billion.
The author has kindly provided you with simple, readable, and free code. If you find it incomplete or unsafe, you can always modify it and contribute your changes if you wish to improve it, in accordance with the licence; and thank him while you're at it.
Could just change the input len to an int instead of size_t. Not technically the correct type, but it would make it clear to the user that the input can't be greater than 2^31 in length.
How is ssize_t any better? It's not part of standard C and is only guaranteed to be capable of holding values between -1 and SSIZE_MAX (minimum 32767, no relation to SIZE_MAX).
This is rather lenient. There's not anything wrong with that (although perhaps it should be noted for people that will use it without looking at the code), but it's the main reason this can be so small. Using their demo in the readme:
It's astonishing how involved a fucking modern JSON library becomes.
The once "very simple" C++ single-header JSON library by nlohmann is now
* 13 years old
* is still actively merging PRs (last one 5 hours ago)
* has 122 __million__ unit tests
Despite all this, it's self-admittedly still not the fastest possible way to parse JSON in C++. For that you might want to look into simdjson.
Don't start your own JSON parser library. Just don't. Yes you can whiteboard one that's 90% good enough in 45 minutes but that last 10% takes ten thousand man hours.
I did write one, but I needed to because the already-written data must be recoverable on a crash (to be able to recover partially written files) since this is in a crash reporter - and also the encoder needs to be async-safe.
So I'm in the process of replacing it with a BONJSON codec, which has the same capabilities, is still async-safe and crash resilient, and is 35x faster with less code.
Yeah, but as long as I'm not releasing in public, I don't need to support 20 different ways of parsing.
That's the thing with reinventing wheels, a wheel that fits every possible vehicle and runs well in any possible terrain is very difficult to build. But when you know exactly what you need it's a different story.
I am very surprised to hear the unit testing statistic. What kind of unholy edge cases would JSON parsing require to make it necessary to cover 122 million variations?
Many of the problems disappear when performance is not critical, because that opens up the options for many much nicer, much safer, and simpler languages and C/C++, to write a correct parser in.
The project advertises that it has zero-allocations with minimal state. I don’t think it is fair or our problems are very different. Single string, (the most used type), and you need an allocation.
It doesn't seem to have much in the way of validation, e.g., it will indiscriminately let you use either ']' or '}' to terminate an object or array. Also, it's more lenient than RFC or json.org JSON in allowing '\v' for whitespace. I'd treat it more as a "data extractor for known-correct JSON". But even then, rolling your own string or number parser could get annoying, unless the producer agrees on a subset of JSON syntax.
You know what would really be useful is a conformance test based on a particular real implementation.
What I mean by this is a subset (superset?) that exactly matches the parsing behavior of a specific target parsing library. Why is this useful? To avoid the class of vulnerabilities that rely on the same JSON being handled differently by two different parsers (you can exploit this to get around an authorization layer, for example).
I really enjoy these simple libraries, even though they are too flawed to be used for anything serious. There's great freedom in just drilling down to the basics, ignoring all the complexities, and just writing code that'll probably work most of the time.
This is quite neat. I wrote a similar library for no-alloc JSON parsing, but never had use for it. This does actual parsing, though; my approach is to just navigate through a JSON tree.
- log.c - A simple logging library implemented in C99
- microui - A tiny immediate-mode UI library
- fe - A tiny, embeddable language implemented in ANSI C
- microtar - A lightweight tar library written in ANSI C
- cembed - A small utility for embedding files in a C header
- ini - A tiny ANSI C library for loading .ini config files
- json.lua - A lightweight JSON library for Lua
- lite - A lightweight text editor written in Lua
- cmixer - Portable ANSI C audio mixer for games
- uuid4 - A tiny C library for generating uuid4 strings
I used "lite" (text editor in Lua) which has been mentioned under this submission. It is cool, too.
https://github.com/rxi/lume
[1] https://www.gnu.org/philosophy/free-sw.html [2] https://opensource.org/osd
The MIT license upholds the four essential freedoms of free software: the right to run, copy, distribute, study, change and improve the software.
It is listed under "Expat License" in the list of GPL-compatible Free Software licenses.
https://www.gnu.org/licenses/license-list.html
https://github.com/rxi/sj.h/blob/eb725e0858877e86932128836c1...
https://github.com/rxi/sj.h/blob/eb725e0858877e86932128836c1...
https://github.com/rxi/sj.h/blob/eb725e0858877e86932128836c1...
Certain inputs can therefore trigger UB.
If there is a conscious intent of disregarding safety as you say, the Readme should have a prominent warning about that.
where? single header is just a way to package software, it has no relation to features, security or anything such...
- overestimating the gravity of a UB and its security implications
- underestimate the value of a 150 line json parser
- or overestimate the feasibility of having both a short and high quality parser.
It sometimes happens that fixing a bug is quicker than defending the low quality. Not everything is a tradeoff.
Sometimes, it's just not the responsibility of the library. Trying to handle every possible errors is a quick way to complexity.
[0]: https://43081j.com/2025/09/bloat-of-edge-case-libraries
[1]: https://news.ycombinator.com/item?id=45319399
It's the wrong attitude for a JSON parser written in C, unless you like to get owned.
(TIP: choose the latter)
UB is bad.
Sometimes. In this case, where the library is a parser that is written in C. I think it is reasonable to expect the library to handle all possible inputs. Even corner cases like this which are unlikely to be encountered in common practice. This is not "bloat" it is correctness.
In C, this kind of bug is capable of being exploited. Sure, many users of this lib won't be using it in exposed cases, but sooner or later the lib will end up in some widely-used internet-facing codebase.
As others have said, the fix could be as simple as bailing once the input size exceeds 1GB. Or it could be fine-grained. Either-way the fix would not "bloat" the codebase.
And yes, I'm well aware of the single-file C library movement. I am a fan.
Skimming the code, they also are loose in parsing incorrect json, it seems:
that seems to imply it treats “00.-E.e-8..7-E7E12” as a valid json number. I think that means the code finds [1,2} a valid array and {"foo": 42] a valid struct (maybe, it even is happy with [1,2,"foo":42})Those, to me, seem a more likely attack vector. The example code, for example, calls atoi on something parsed by the first piece of code.
⇒ I only would use this for parsing json config files.
Being tiny is one thing, but the json grammar isn’t that complex. They could easily do a better job at this without adding zillions of lines of code.
Limit you JS input to 1 GB. I will have more problems in other portions of the stack if I start to receive a 2 GB JSON file over the web.
And if I still want to make it work for > 2GB, I would change all int in the source to 64 bits. Will still crash if input is > 2^64.
What I won't ever do in my code is check for int overflow.
Amen. Just build with -fno-strict-overflow, my hot take is that should be the default on Linux anyway.
- a JSON file with nested values exceeding 2 billion depth
- a file with more than 2 billion lines
- a line with more than 2 billion characters
If you are nesting 2 Billion times in a row ( at minimum this means repeat { 2 billion times followed by a value before } another 2 billion times. You have messed up.
You have 4GB of "padding"...at minimum.
You file is going to be Petabytes in size for this to make any sense.
You are using a terrible format for whatever you are doing.
You are going to need a completely custom parser because nothing will fit in memory. I don't care how much RAM you have.
Simply accessing an element means traversing a nested object 2 billion times in probably any parser in the world is going to take somewhere between minutes and weeks per access.
All that is going to happen in this program is a crash.
I appreciate that people want to have some pointless if(depth > 0) check everywhere, but if your depth is anywhere north of million in any real world program, something messed up a long long time ago, never mind waiting until it hits 2 billion.
Maybe more importantly, I won’t trust the rest of the code if the author doesn’t seem to have the finite range of integer types in mind.
Deleted Comment
I don’t know what else you call a library that just extracts data.
They're either written with a different use case in mind, or a complex mess of abstractions; often both.
It's not a very difficult problem to solve if you only write exactly what you need for your specific use case.
The once "very simple" C++ single-header JSON library by nlohmann is now
* 13 years old
* is still actively merging PRs (last one 5 hours ago)
* has 122 __million__ unit tests
Despite all this, it's self-admittedly still not the fastest possible way to parse JSON in C++. For that you might want to look into simdjson.
Don't start your own JSON parser library. Just don't. Yes you can whiteboard one that's 90% good enough in 45 minutes but that last 10% takes ten thousand man hours.
https://github.com/kstenerud/KSCrash/blob/master/Sources/KSC...
And yeah, writing a JSON codec sucks.
So I'm in the process of replacing it with a BONJSON codec, which has the same capabilities, is still async-safe and crash resilient, and is 35x faster with less code.
https://github.com/kstenerud/ksbonjson/blob/main/library/src...
https://github.com/kstenerud/ksbonjson/blob/main/library/src...
That's the thing with reinventing wheels, a wheel that fits every possible vehicle and runs well in any possible terrain is very difficult to build. But when you know exactly what you need it's a different story.
https://seriot.ch/projects/parsing_json.html
So in this case you're wrong.
General purpose is a different can of worms compared to solving a specific case.
Sexprs sitting over here, hoping for some love.
https://github.com/nst/JSONTestSuite
What I mean by this is a subset (superset?) that exactly matches the parsing behavior of a specific target parsing library. Why is this useful? To avoid the class of vulnerabilities that rely on the same JSON being handled differently by two different parsers (you can exploit this to get around an authorization layer, for example).
Im still impressed and might use it, but just noting this.
https://github.com/lelanthran/libxcgi/blob/master/library/sr...
https://github.com/lelanthran/libxcgi/blob/master/library/sr...