JSON has become today's machine-readable output format on Unix

Oddly enough, BitTorrent's bencode format is a significant subset of JSON but is much easier to parse as a binary format. Bencode only supports integers, byte strings, lists, and dictionaries.

I wrote a more detailed comparison on: https://www.nayuki.io/page/bittorrent-bencode-format-tools

1vuio0pswjnm7 · a year ago

"BitTorrent became popular around year 2005, whereas JSON became popular around year 2010. The usage of JSON in the real world has greatly eclipsed BitTorrent or bencode, so there is a natural bias to view bencode through the lens of JSON even though JSON was adopted later (though not necessarily invented later)."

Netstrings was proposed in 1997.

https://cr.yp.to/proto/netstrings.txt

Whether bencoding's remarkable similarity to netstrings is purely coincidence is left as a question for the reader.

Perhaps there is a "natural bias" to view netstrings through the lens of bencoding even though bencoding came much later.

"Oddly enough, BitTorrent's bencode format is a significant subset of JSON but is much easier to parse as a binary format."

Curious what makes it "odd".

JSON assumed that memory is an unlimited resource.^1 It is like having to read an entire file into memory before processing it. Hence we see revisions such as "line-delimited JSON". Netstrings is even more memory-efficient than line-oriented processing; there is no need to read in an entire line.

1. This makes sense if one agrees that JSON was designed for graphical web browsers, programs notorious for excessive memory usage.

stock_toaster · a year ago

tnetstrings[1] was a later refinement of netstrings.

I liked it, but alas it never seemed to really take off.

[1]: https://tnetstrings.info/

amelius · a year ago

So if you want strings, you need to guess what encoding was used or store the encoding in another field? I don't think that makes it a much nicer format. I do like the ability to store byte strings directly.

skerit · a year ago

Oh, I didn't know about Bencode. It looks interesting. Thank you for sharing!

sriram_malhar · a year ago

I really like bencode. The only thing I miss is floats.

victorstanciu · a year ago

You can use two integers, one that represents the entire number including decimals, and one that represents the precision, to know how many decimals are there. For example, you'd represent "123.45" as 12345 and 2. That's often how monetary amounts are stored in databases, to avoid a lot of common floating-point arithmetic pitfalls.

Dead Comment

Oh dear - bitten by his "scraping" protection.

As his site no longer allows me to view with Firefox-115 ESR, it being deemed as "too old" despite still being a supported release.

nayuki · a year ago

Actual text:

> You're using a suspiciously old browser

> You're probably reading this page because you've attempted to access some part of my blog (Wandering Thoughts) or CSpace, the wiki thing it's part of. Unfortunately you're using a browser version that my anti-crawler precautions consider suspicious, most often because it's too old (most often this applies to versions of Chrome). Unfortunately, as of early 2025 there's a plague of high volume crawlers (apparently in part to gather data for LLM training) that use a variety of old browser user agents, especially Chrome user agents. To reduce the load on Wandering Thoughts I'm experimenting with (attempting to) block all of them, and you've run into this.

> If this is in error and you're using a current version of your browser of choice, you can contact me at my current place at the university (you should be able to work out the email address from that). If possible, please let me know what browser you're using and so on, ideally with its exactl User-Agent string. Chris Siebenmann, 2025-02-17

thatcks · a year ago

There was a little glitch in the scraping protection where an errant regular expression briefly blocked all Firefox versions. Which is especially bad because I (the blog author) exclusively use Firefox, so I was blocking myself. The management apologizes for the problem (and generally allows much older Firefox versions than Chrome versions, as people seem to still use them on various platforms).

dfawcus · a year ago

Hmm - it still blocks me when my browser reports its native string of:

    Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0

however if I override it to the following, the site lets be through:

    Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0

beej71 · a year ago

I hit reload on FF and it worked...? I don't know why I got flagged once and not subsequent times.

DanielHB · a year ago

FYI if anyone is looking into data formats for high performance stuff I recommend looking into CBOR:

https://en.wikipedia.org/wiki/CBOR

It is very popular in embedded systems and IoT. It is basically binary JSON, but supports numbers as key, variable-sized ints/floats, utf-8 strings. CBOR binary blobs can be converted to JSON keeping similar semantics (number keys will be converted to strings) and parsed with `jq` for human-readability/debugging.

It can also be read in "pull parsing" mode so you don't need to keep the whole message in memory. This is the opposite of "DOM parsing" although I don't know if these terms are applicable in non XML contexts. When you only have 256KB of RAM this is quite important.

braggerxyz · a year ago

I agree, and I feel like the reason to this is the mere existence of 'jq'. Without 'jq' working with json in a Unix shell would be a lot more uncomfortable, but not impossible.

pyuser583 · a year ago

Also Python. Python handles json really well, at least compared to say bash.

I want to say server side JavaScript plays a role in this too, but I’m not a JS developer.

stemlord · a year ago

Though jq syntax leaves a lot to be desired

jauntywundrkind · a year ago

The syntax felt ok to me; selectors felt natural and pipes felt very conventionally shell-like. Ut man, the vocabulary, the variety of different operators you'll need to use in this circumstance or that–brutal.

There's some decent cookbooks/recipes but they're still 1/5th as big as they could be.

s_dev · a year ago

What do you think the trade off here with syntax is and what jq was designed for?

A bash script might need to execute it and you want something without lots of funny characters or whitespace as it's going straight in to the terminal.

That necessary terseness makes it the opposite of readable.

enriquto · a year ago

sort of agree... but only because you can gron it to remove the madness and then grep/cut/sed/awk the output like a human being.

JSON is just a modern flavor of XML, and in a few years we'll likely mock both of them as if they were the same silly thing. They are functionally equivalent: not human-writable, cumbersome, crufty, with unclear semantics, with unclear semantic usage, and all around ugly.

maccard · a year ago

I unfortunately write a bit of both xml and json as part of my day to day. JSON is significantly easier to read and write as a human. So many xml files use a combination of the file structure, the nodes and the attributes to encode their data - meaning to parse it you need to know the specifics of how it was written. JSON is much simpler and 95% of the time can be thought of as map<string, JsonObject> and it just works.

Yml goes too far in the brevity side - I find the 2 space indent, use of “-“ as a list delimiter, white space sensitivity and confusing behaviour with quoted vs unquoted strings incredibly hard to work with.

> 95% of the time can be thought of as map<string, JsonObject>

But for that case you don't need json. A dockerfile-like text file with lines of the form

    STRING other stuff

is trivial to parse in any language and without requiring any library. And it's actually human-editable.

Using json for such trivial stuff is like using a linear algebra library to compute b/a (the quotient of two numbers) by calling linalg.solve([[a]],[b]). Of course it will work. Of course it is more general. But it feels silly.

Ferret7446 · a year ago

XML, per its name, is a markup language. The fact that it was abused as a general purpose serialization format was abhorrent, as is equating it with JSON. JSON is conceptually closer to Lisp.

rffn · a year ago

The article suggests that Gnu Awk might soon improve its understanding of JSON.

Can somebody please shed some light at this? Will gawk get JSON support. Or is is already there and I just need to get a recent version?

shakna · a year ago

GNU awk already does. [0] Sorta. The "non-essential" stuff, like xml, json, redis gets put into gawkextlib. Usually packaged for your platform.

[0] https://www.gnu.org/software/gawk/manual/html_node/gawkextli...

Thanks. I wasn't aware about this library.

IcyWindows · a year ago

This is part of the reason I love working with powershell. I like having things already in a json-like format by default.

kstenerud · a year ago

JSON solves enough problems and is a simple enough format to become ubiquitous. My only beef is with the serialization/deserialization costs.

That's why I've made a 1:1 binary format

https://github.com/kstenerud/bonjson