Readit News logoReadit News
hinkley · 7 months ago
JSON encoding is a huge impediment to interprocess communication in NodeJS.

Sooner or later is seems like everyone gets the idea of reducing event loop stalls in their NodeJS code by trying to offload it to another thread, only to discover they’ve tripled the CPU load in the main thread.

I’ve seen people stringify arrays one entry at a time. Sounds like maybe they are doing that internally now.

If anything I would encourage the V8 team to go farther with this. Can you avoid bailing out for subsets of data? What about the CString issue? Does this bring faststr back from the dead?

jcdavis · 7 months ago
Based off of my first ever forays into node performance analysis last year, JSON.stringify was one of the biggest impediments to just about everything around performant node services. The fact that everyone uses stringify to for dict keys, the fact that apollo/express just serializes the entire response into a string instead of incrementally streaming it back (I think there are some possible workarounds for this, but they seemed very hacky)

As someone who has come from a JVM/go background, I was kinda shocked how amateur hour it felt tbh.

MehdiHK · 7 months ago
> JSON.stringify was one of the biggest impediments to just about everything around performant node services

That's what I experienced too. But I think the deeper problem is Node's cooperative multitasking model. A preemptive multitasking (like Go) wouldn't block the whole event-loop (other concurrent tasks) during serializing a large response (often the case with GraphQL, but possible with any other API too). Yeah, it does kinda feel like amateur hour.

hinkley · 7 months ago
> Based off of my first ever forays into node performance analysis last year, JSON.stringify was one of the biggest impediments to just about everything around performant node services

Just so. It is, or at least can be, the plurality of the sequential part of any Amdahl's Law calculation for Nodejs.

I'm curious if any of the 'side effect free' commentary in this post is about moving parts of the JSON calculation off of the event loop. That would certainly be very interesting if true.

However for concurrency reasons I suspect it could never be fully off. The best you could likely do is have multiple threads converting the object while the event loop remains blocked. Not entirely unlike concurrent marking in the JVM.

dmit · 7 months ago
Node is the biggest impediment to performant Node services. The entire value proposition is "What if you could hire people who write code in the most popular programming language in the world?" Well, guess what

Deleted Comment

nijave · 7 months ago
Same problem in Python. It'd be nice to have good/efficient IPC primitives with higher level APIs on top for common patterns
tgv · 7 months ago
> If anything I would encourage the V8 team to go farther with this.

That feels the wrong way to go. I would encourage the people that have this problem to look elsewhere. Node/V8 isn't well suited to backend or the heavier computational problems. Javascript is shaped by web usage, and it will stay like that for some time. You can't expect the V8 team to bail them out.

The Typescript team switched to Go, because it's similar enough to TS/JS to do part of the translation automatically. I'm no AI enthousiast, but they are quite good at doing idiomatic translations too.

com2kid · 7 months ago
> Node/V8 isn't well suited to backend

Node was literally designed to be good for one thing - backend web service development.

It is exceptionally good at it. The runtime overhead is tiny compared to the JVM, the async model is simple as hell to wrap your head around and has a fraction of the complexity of what other languages are doing in this space, and Node running on a potato of a CPU can handle thousands of requests per second w/o breaking a sweat using the most naively written code.

Also the compactness of the language is incredible, you can get a full ExpressJS service up and running, including auth, in less than a dozen lines of code. The amount of magic that happens is almost zero, especially compared to other languages and frameworks. I know some people like their magic BS (and some of the stuff FastAPI does is nifty), but Express is "what you see is what you get" by default.

> The Typescript team switched to Go, because it's similar enough to TS/JS to do part of the translation automatically.

The TS team switched to go because JS is horrible at anything that isn't strings or doubles. The lack of an int type hinders the language, so runtimes do a lot of work to try and determine when a number can be treated like an int.

JS's type system is both absurdly flexible and also limiting. Because JS basically allows you to do anything with types, Typescript ends up being one of the most powerful type systems that has seen mass adoption. (Yes other languages have more powerful type systems, but none of them have the wide spread adoption TS does).

If I need to model a problem domain, TS is an excellent tool for doing so. If I need to respond to thousands of small requests, Node is an excellent tool for doing so. If I need to do some actual computation on those incoming requests, eh, maybe pick another tech stack.

But for the majority of service endpoints that consist of "get message from user, query DB, reformat DB response, send to user"? Node is incredible at solving that problem.

brundolf · 7 months ago
Yeah. I think I've only ever found one situation where offloading work to a worker saved more time than was lost through serializing/deserializing. Doing heavy work often means working with a huge set of data- which means the cost of passing that data via messages scales with the benefits of parallelizing the work.
hinkley · 7 months ago
I think the clues are all there in the MDN docs for web workers. Having a worker act as a forward proxy for services; you send it a URL, it decides if it needs to make a network request, it cooks down the response for you and sends you the condensed result.

Most tasks take more memory in the middle that at the beginning and end. And if you're sharing memory between processes that can only communicate by setting bytes, then the memory at the beginning and end represents the communication overhead. The latency.

But this is also why things like p-limit work - they pause an array of arbitrary tasks during the induction phase, before the data expands into a complex state that has to be retained in memory concurrent with all of its peers. By partially linearizing you put a clamp on peak memory usage that Promise.all(arr.map(...)) does not, not just the thundering herd fix.

userbinator · 7 months ago
You mean:

> JSON encoding is a huge impediment to communication

I wonder how much computational overhead JSON'ing adds to communications at a global scale, in contrast to just sending the bytes directly in a fixed format or something far more efficient to parse like ASN.1.

hinkley · 7 months ago
No. Because painful code never gets optimized as much as less painful code. People convince themselves to look elsewhere, and an incomplete picture leads to local optima.
cogman10 · 7 months ago
It's a major problem for JVM performance as well. Json encoding is simply fundamentally an expensive thing to do.

One thing that improves performance for the JVM that'd be nice to see in node realm is that JSON serialization libraries can stream out the serialization. One of the major costs of JSON is the memory footprint. Strings take up a LOT more space in memory than a regular object does.

Since the JVM typically only uses JSON as a communication protocol, streaming it out makes a lot of sense. The IO (usually) takes long enough to give a CPU reprieve while simultaneously saving memory.

zamalek · 7 months ago
> Sooner or later is seems like everyone gets the idea of reducing event loop stalls in their NodeJS code by trying to offload it to another thread, only to discover they’ve tripled the CPU load in the main thread.

Why not use structuredClone to communicate with the worker? So long as your object upholds all the rules you can pass it into postMessage directly.

hinkley · 7 months ago
Structured clone only gets you a new object in this heap, not a new object in an isolate. You’re still doing stringify/parse under the hood every time you sendMessage.
deadbabe · 7 months ago
When it comes time to do serious work in Node, that's when you start using TypedArrays and SharedArrayBuffers and work with straight binary data. stringifying is mostly for toy apps and smaller projects.
hinkley · 7 months ago
If I took all the coworkers I’ve had in 30 years of coding whom I could really trust with the sort of bit fiddling array buffers, I could populate maybe two companies and everyone else would be fucked.

TypedArray is a toy. Very few domains actually work well with this sort of data. Statistics, yes. Games? Do all the time, but also games: are full of glitches used by speed runners, due to games playing fast and loose to maintain an illusion they are doing much more per second than they should be able to.

DataView is a bit better. I am endlessly amazed at how many times I managed to read people talk about TypedArrays and SharedByteArrays before I discovered that DataView exists and has existed basically forever. Somebody should have mentioned it a lot, lot sooner.

dwattttt · 7 months ago
Now to just write the processing code in something that compiles to WebAssembly, and you can start copying and sending ArrayBuffers to your workers!

Or I guess you can do it without the WebAssembly step.

hinkley · 7 months ago
A JSON.toBuffer would be another good addition to V8. There are a couple code paths that look like they might do this, but from all accounts it goes Object->String->Buffer, and for speed you want to skip the intermediate.
jonas21 · 7 months ago
The part that was most surprising to me was how much the performance of serializing floating-point numbers has improved, even just in the past decade [1].

[1] https://github.com/jk-jeon/dragonbox?tab=readme-ov-file#perf...

jameshart · 7 months ago
Roundtripping IEEE floating point values via conversion to decimal UTF-8 strings and back is a ridiculously fragile process, too, not just slow.

The difference between which values are precisely representable in binary and which are precisely representable in decimal means small errors can creep in.

jk-jeon · 7 months ago
A way to achieve perfect round-tripping was proposed back in 1990, by Steele and White (and likely they are not the first ones who came up with a similar idea). I guess their proposal probably wasn't extremely popular at least until 2000's, compared to more classical `printf`-like rounding methods, but it seems many languages and platforms these days do provide such round-tripping formatting algorithms as the default option. So, I guess nowadays roundtripping isn't that hard, unless people do something sophisticated without really understanding what they're doing.
gugagore · 7 months ago
You don't have to precisely represent the float in decimal. You just have to have each float have a unique decimal representation, which you can guarantee if you include enough digits: 9 for 32-bit floats, and 17 for 64-bit floats.

https://randomascii.wordpress.com/2012/02/11/they-sure-look-...

kccqzy · 7 months ago
Most languages in use (such as Python) have solved this problem ages ago. Take any floating point value other than NaN, convert it to string and convert the string back. It will compare exactly equal. Not only that, they are able to produce the shortest string representation.
ot · 7 months ago
The SWAR escaping algorithm [1] is very similar to the one I implemented in Folly JSON a few years ago [2]. The latter works on 8 byte words instead of 4 bytes, and it also returns the position of the first byte that needs escaping, so that the fast path does not add noticeable overhead on escape-heavy strings.

[1] https://source.chromium.org/chromium/_/chromium/v8/v8/+/5cbc...

[2] https://github.com/facebook/folly/commit/2f0cabfb48b8a8df84f...

hairtuq · 7 months ago
If you want to optimize it a little more, you can combine isLess(s, 0x20) and isChar('"') into isLess(s ^ (kOnes * 0x02), 0x21) (this works since '"' is 0x22).
monster_truck · 7 months ago
I don't think v8 gets enough praise. It is fucking insane how fast javascript can be these days
andyferris · 7 months ago
Yeah, it is quite impressive!

It's a real example of "you can solve just about anything with a billion dollars" though :)

I'd prefer JavaScript kept evolving (think "strict", but "stricter", "stricter still", ...) to a simpler and easier to compile/JIT language.

fngjdflmdflg · 7 months ago
I want JS with sound types. It's interesting how sound types can't be added to JS because runtime checks would be too expensive, but then so much of what makes JS slow is having to check types all the time anyway, and the only way to speed it up is to retroactively infer the types. I want types plus a "use typechecked" that tells the VM I already did some agreed upon level of compile-time checks and now it only needs to do true runtime checks that can't be done at compile time.
Cthulhu_ · 7 months ago
> I'd prefer JavaScript kept evolving (think "strict", but "stricter", "stricter still", ...) to a simpler and easier to compile/JIT language.

This is / was ASM.js, a limited subset of JS without all the dynamic behaviour which allowed the interpreter to skip a lot of checks and assumptions. This was deprecated in favor of WASM - basically communicating that if you need the strictness or performance, use a different language.

As for JS strictness, eslint / biome with all the rules engaged will also make it strict.

ayaros · 7 months ago
Yes, this is what I want too. Give me "stricter" mode.
ihuman · 7 months ago
Like asm.js was, before webassembly replaced it?
shivawu · 7 months ago
On the other hand, I consider v8 the most extreme optimized runtime in a weird way, in that there’re like 100 people on the planet understand how it works, while the rest of us be like “why my JS not fast”
yieldcrv · 7 months ago
and then there are the people saying “why JS” before going on a archaic rant and then leaving the interview concluding their rejection was age discrimination
Tiberium · 7 months ago
V8 is extremely good, but (maybe due to JS itself?) it still falls short of LuaJIT and even JVM performance. Although at least for JVM it takes way longer to warm up than the other two.
mhh__ · 7 months ago
It's JS, V8 is afaict much more advanced than luajit and jvm.

Although java also has the advantage of not having to be ~ real time (i.e. has a compiler)

Cthulhu_ · 7 months ago
> maybe due to JS itself?

Nail on the head; a lot of JS overhead is due to its dynamic nature. asm.js disallowed some of this dynamic behaviour (like changing the shape of objects, iirc), meaning they could skip a lot of these checks.

MrBuddyCasino · 7 months ago
„even“

Hombre, you’re about the best there is.

xxs · 7 months ago
that 'even' made me chuckle - exactly the same 'reaction'
gampleman · 7 months ago
> No replacer or space arguments: Providing a replacer function or a space/gap argument for pretty-printing are features handled exclusively by the general-purpose path. The fast path is designed for compact, non-transformed serialization.

Do we get this even if you call `JSON.stringify(data, null, 0)`? Or do the arguments literally have to be undefined?

tln · 7 months ago
I don't think so. I just got ~2.4 for a 512KB json using JSON.stringify(data, null, 0) and ~1.4 for JSON.stringify(data). JSON.stringify(data, undefined, undefined) was the same as JSON.stringify(data). Other combos were slower

https://microsoftedge.github.io/Demos/json-dummy-data/512KB.... Chrome 138.0.7204.184

MutedEstate45 · 7 months ago
I really like seeing the segmented buffer approach. It's basically the rope data structure trick I used to hand-roll in userland with libraries like fast-json-stringify, now native and way cleaner. Have you run into the bailout conditions much? Any replacer, space, or custom .toJSON() kicks you back to the slow path?
notpushkin · 7 months ago
> No indexed properties on objects: The fast path is optimized for objects with regular, string-based keys. If an object contains array-like indexed properties (e.g., '0', '1', ...), it will be handled by the slower, more general serializer.

Any idea why?

inbx0 · 7 months ago
My guess would be because they affect property ordering, complicating the stringification.

The default object property iteration rules in JS define that numeric properties are traversed first in their numeric order, and only then others in the order they were added to the object. Since the numbers need to be in their numeric, not lexical, order, the engine would also need to parse them to ints before sorting.

    > JSON.stringify({b: null, 10: null, 1: null, a: null})
    '{"1":null,"10":null,"b":null,"a":null}'

Timwi · 7 months ago
I wonder that too. Are they saying that objects with integer-looking keys are serialized as JSON arrays?? Surely not...?