Jq is rounding 64-bit unsigned integers (2017)

>> x = '{"id":675127116845989888,"id_str":"675127116845989888"}' <- "{\"id\":675127116845989888,\"id_str\":\"675127116845989888\"}" >> JSON.parse(x) <- Object { id: 675127116845989900, id_str: "675127116845989888" }

For whatever it's worth, on a somewhat-current Linux Mint, with the test from https://github.com/stedolan/jq/issues/1387:

System jq:

    $ jq --version
    jq-1.6
    $ echo '{"number":288230376151711744}' | jq '.number'
    288230376151711740

Fresh compile from source according to the build instructions at https://github.com/stedolan/jq:

    $ ./configure --with-oniguruma=builtin && make -j8
    $ ./jq --version
    jq-1.6-137-gd18b2d0-dirty
    $ echo '{"number":288230376151711744}' | ./jq '.number'
    288230376151711744

Alternatively:

    $ ./configure --with-oniguruma=builtin --enable-decnum=no && make -j8
    $ echo '{"number":288230376151711744}' | ./jq '.number'
    288230376151711740

So the basic bug is fixed, jq has included a bignum library for > 2 years. I don't know if Mint (and thus presumably Ubuntu, and thus possibly Debian) includes an older version of jq or sets nonstandard user-unfriendly flags on purpose, but I'm somewhat underwhelmed in either case.

I really like the idea of just treating numbers as a higher level data type instead of ints, floats, signed, unsigned, or whatever. I wish most high level languages kept floats as an implementation detail, and defaulted to decimals.

Though I don't see a way around it with a Serialization format like JSON that is meant to work across languages. I've done limited work with static languages, but from what I remember it would be a nightmare to have a variable that could be one of many types. I'm thinking overloading, or interfaces, or something. Could someone familiar with like Go or C#, or whatever explain how they would handle that?

Edit:

Writing that reminded me of something. If you are using a 64-bit unsigned integer, and you need to convert it to JSON for public use, please do not just give a object with a high and low value in hex. If you decide to do this anyway, but also ship an official Python SDK, just do the conversion in the SDK. I'm looking at you F5.

I should have never needed to write this function to read memory usage stats, but it was kind of fun figuring it out.

    def ulong64_to_int(ulong64):
        high = ulong64.get('high')
        low = ulong64.get('low')
        return int('{0:032b}{1:032b}'.format(high & 0xffffffff, low & 0xffffffff), 2)

mixedCase · 5 years ago

> Could someone familiar with like Go or C#, or whatever explain how they would handle that?

Visitor pattern. A really shitty, but workable, implementation of tagged sum types.

zosima · 5 years ago

Why not just?

def ulong64_to_int(ulong64): return (int(ulong64['high']) << 32) + int(ulong64['low'])

dec0dedab0de · 5 years ago

Honestly, because I always forget that bit shifting exists in Python. I did originally have it as a lambda, but broke out the variables when I made it a normal function while explaining how it worked to someone who needed it.

Though the point I was trying to make is that if you're going to be sharing data, you should serialize it in a way that works for multiple languages, probably a string in this case. Or at the very least, if you're going to provide a client library for a language, you should make that library present the data in a way that makes sense for that language.

ggm · 5 years ago

"it's not a bug" is a really bizarre response.

"we know" is kind of ok, but the status of bug-hood is defined between coders and users, not solely by coders I think, and this breaks the POLA severely: People who depend on JQ don't expect this.

theamk · 5 years ago

In this particular though, the spec explicitly talks about that [0]

> Since software that implements IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide...

[0] https://datatracker.ietf.org/doc/html/rfc7159#section-6

Which means that if you are putting 64 bit integers into JSON and require every bit to be used, you are not actually creating a JSON which is compatible with all the consumers. For example, such JSON is not compatible with browser. Here is what my Firefox's JQ console says:

I'd say that JQ acting the same way as browsers is pretty reasonable, no?

tom_mellior · 5 years ago

Since jq does something completely different from a browser, it would be reasonable for it to try harder in some respects. A tool that is supposed to pass certain data through unchanged... Should not change that data. Even if we can expect that data to eventually be rounded by the eventual consumer at some later time.

Yea. I think so. But, I think people who are stuck in 64 bits forget about 128 bit digit strings being common now.

I am more wrong than right: if you can point to a written spec saying "be not astonished" then POLA doesn't apply. And you did.

benibela · 5 years ago

That is the problem with JSON. XML worked much better for integers. And if it is typed with XML schema or XPath, XML has an xs:unsignedLong integer type.

I wrote my own command line XPath JSON-query tool and used bigdecimals for JSON numbers. If you do not do much math, bigdecimals are probably even faster than using floats, and much easier to implement. Converting a string to a double float is extraordinary complex. I think it is one of the most difficult tasks in computing.

Unfortunately, now the W3C made a new XPath standard that requires JSON parsers to use double for numbers, so I changed my tool to use doubles as well. Now I am struggling with the string<->float conversion. The conversion in the standard library does not work properly. I just looked at another conversion library. 4000 lines of code, and after a 2 hour investigation, it turns out, it also does not work properly.

codetrotter · 5 years ago

Ran across this today and spent a good 20 to 25 minutes trying to debug why my own code was failing when it was jq that was rounding an id in the JSON response.

Submitting this to let others that use jq beware of this :/

Thanks for submitting this interesting issue. Out of interest, what system are you on, and did jq come from a package manager? If you build it from source, it should work: https://news.ycombinator.com/item?id=27362060

macOS Big Sur on a MacBook Pro M1. Installed via the Homebrew package manager.

himinlomax · 5 years ago

I've always (20+ years) known that Javascript numbers were floats, I've thus always assumed JSON numbers to be as well. Is this not the spec? I just store big integers in JSON strings.

stkdump · 5 years ago

JSON numbers are arbitrary precision, because the spec doesn't limit the precision in any way.

neolog · 5 years ago

Those numbers are so big, I'm curious what your use case is for them.

swebs · 5 years ago

The comment states that they're IDs.

NackerHughes · 5 years ago

Storing Twitter IDs, of course!

arpa · 5 years ago

partitioning using prefixes?

x3ro · 5 years ago

Also note that there is another response with a bit more details in it in a different (not closed) ticket from 2018:

https://github.com/stedolan/jq/issues/1741#issuecomment-4306...

thehappypm · 5 years ago

Even processors don't always actually have the capability of doing math on 64 bit numbers, in general, because they're nonsensically large.

colejohnson66 · 5 years ago

That’s… not true at all. You’re telling me that x86-64 and AArch64 don’t work on 64 bit words natively (despite being 64 bit architectures)?

In fact, x86-64 has 256 bit data paths internally in some places.

Check this out:

https://superuser.com/questions/168114/how-much-memory-can-a...

There’s no reason to actually support 64 bit lines.