This looks highly reminiscent (though not exactly the same, pedants) of why people used to get excited about using SAX instead of DOM for xml parsing.
1. Inefficient parser implementation. It's just... very easy to allocate way too much memory if you don't think about large-scale documents, and very difficult to measure. Common problem with many (but not all) JSON parsers.
2. CPython in-memory representation is large compared to compiled languages. So e.g. 4-digit integer is 5-6 bytes in JSON, 8 in Rust if you do i64, 25ish in CPython. An empty dictionary is 64 bytes.
>>> from dataclasses import dataclass
>>> @dataclass
... class C: pass
...
>>> C().x = 1
>>> @dataclass(slots=True)
... class D: pass
...
>>> D().x = 1
Traceback (most recent call last):
File "<python-input-4>", line 1, in <module>
D().x = 1
^^^^^
AttributeError: 'D' object has no attribute 'x' and no __dict__ for setting new attributes
Most of the time this is not a thing you actually need to do.The linked-from-original-article ijson article was the inspiration for the talk: https://pythonspeed.com/articles/json-memory-streaming/
Poireau does this, IIRC, by putting the pointers it sampled in a different memory address.
Sciagraph (https://sciagraph.com), a profiler I created, uses allocation size. If an allocation is chosen for sampling the profiler makes sure its size is at least 16KiB. Then free() will assume that any allocation 16KiB or larger is sampled. This may not be true, it might be false positive, but it means you don't have to do anything beyond malloc_usable_size() if you have free() on lots and lots of small allocations. A previous iteration used alignment as a heuristic, so that's another option.
Our JSON parser, jiter (https://github.com/pydantic/jiter) already supports iterative parsing, so it's "just" a matter of solving the lifetimes in pydantic-core to validate as we parse.
This should make pydantic around 3x faster at parsing JSON and significantly reduce the memory overhead.