A Python dict that can report which keys you did not use

Only tangentially related but I am really excited about PEP 764¹ (inline typed dictionaries). If it gets accepted, we can finally replace entire hierarchies of dataclasses with simple nested dictionary types and call it a day.

I am currently teaching (typed) Python to a team of Windows sysadmins and it's been incredibly difficult to explain when to use a dataclass, a NamedTuple, a Pydantic model, or a dictionary.

¹) https://peps.python.org/pep-0764/

xg15 · 5 months ago

To be honest, that proposal sounds like it would make the problem even worse, by blurring the line between dicts and dataclasses even more.

codethief · 5 months ago

How does creating anonymous TypedDicts (and allowing them to be nested on the fly) blur the line "even more" when those features are not supported by dataclasses?

I mean I agree w.r.t. the blurriness in general but this PEP is not going to change anything about that, in neither direction.

mvieira38 · 5 months ago

When, if ever, do you use TypedDicts?

tiltowait · 5 months ago

I use them for API responses/requests where dataclasses/pydantic don't add much value and introduce extra function calls and overhead. It's most common when part of the response from one API gets shuttled off to another. There's often no value in initializing a model object, but it's still handy to have some form of type-checking as you construct the next API call.

JohnKemeny · 5 months ago

Do you seriously have difficulties explaining when to use a class and when to use a dictionary?!

codethief · 5 months ago

You can create dictionaries on the fly. But dataclass objects require defining that dataclass first. The type safety (and LSP support) story for accessing individual dataclass fields is better than for accessing dict items (sometimes even when they are TypedDicts), but for iterating over all fields it's worse. dataclasses are nominal types and can contain additional logic, TypedDicts are structural ones, overall simpler, can be more convenient and lead to looser coupling. Dataclasses use metaclass and decorator magic while TypedDics are just plain dicts. Etc.

Let me make this more concrete: Those sysadmins frequently need to process and pass around complex (as in heavily nested) structured data. The data often comes in the form of singleton objects, i.e. they are built in single place, then used in another place and then thrown away (or merged into some other structure). In other words, any class hierarchy you build represents boilerplate code you'll only ever use once and which will be annoying to maintain as you refactor your code. Do you pick dataclasses or TypedDicts (or something else) for your map data structures?

In TypeScript you would just use `const data = <heavily nested object> as const` and be done with it.

quietbritishjim · 5 months ago

The line is seriously blurred.

Does this handle nested dicts (in pickles in sql, which I had to write code to survey one time)?

A queue-based traversal has flatter memory utilization for deeply nested dicts than a recursive traversal in Python without TCO.

Given a visitor pattern traversal, a visit() function can receive the node path as a list of path components, and update a Counter() with a (full,path,tuple) or "delimiter\.escaped.path" key.

Python collections.UserDict implements the methods necessary to proxy the dict Mapping/MutableMapping interface to self.data. For dicts with many keys, it would probably be faster to hook methods that mutate the UserDict.data dict like __setitem__, get, setdefault, update() and maybe __init__() in order to track which keys have changed instead of copying keys() into a set to do an unordered difference with a list.

React requires setState() for all mutations this.state because there's no way to hook dunder methods in JS: setState() updates this.state and then notifies listeners or calls a list of functions to run when anything in this.state or when a value associated with certain keys or nested keys in this.state changes.

FWIU ipyflow exposes the subscriber refcount/reflist but RxPy specifically does not: ipyflow/core/test/test_refcount.py: https://github.com/ipyflow/ipyflow/blob/master/core/test/tes...

Anyways,

For test assertions, unittest.mock MagicMock can track call_count and call_args_list on methods that mutate a dict like __getitem__ and get(). There's also mock_calls, which keeps an ordered list of the args passed: https://docs.python.org/3/library/unittest.mock.html

boothby · 5 months ago

Just a heads up, this fails to track usage of get and setdefault. The ability to iterate over dicts makes the whole question rather murky.

I didn't know about the setdefault method, and wouldn't have guessed it lets you read a value. Interesting, thanks.

Another way to get data out would be to use the new | operator (i.e. x = {} | y essentially copies dictionary x to y) or the update method or ** unpacking operator (e.g. x = {**y}). But maybe those come under the umbrella of iterating as you mentioned.

notatallshaw · 5 months ago

setdefault was a go to method before defaultdict was added to the collections module in Python 2.5, which replaced the biggest use case.

rjmill · 5 months ago

Indeed. Inheriting from 'collections.UserDict' instead of 'dict' will make TFA's code work as intended for most of those edge cases.

UserDict will route '.get', '.setdefault', and even iteration via '.items()' through the '__getitem__' method.

edited to remove "(maybe all?) edge cases". As soon as I posted, I thought of several less common/obvious edge cases.

hackish · 5 months ago

Along with those and iteration, it also would need to handle del/pop/popitem/update/copy/or/ror/... some of which might necessitate a decision on whether comparisons/repr also count as access.

IshKebab · 5 months ago

I think if you feel like you need this then it's a bit of a red flag and you should be using Pydantic or `dataclass` instead, then your IDE can statically tell you which fields you don't access (among many other benefits). Dicts are mainly for when you don't know the keys up front.

mb7733 · 5 months ago

Static analysis could only tell you which fields are never used, across all usage of the class. Not on a given instance.

taeric · 5 months ago

Counterpoint, something like this for dataclasses would also be very useful.

That is, it isn't just knowing whether or not the data is ever used. It is useful to know if it was used in this specific run. And often times, seeing what parts of the data was not used is a good clue as to what went wrong. At the least, you can use it to rule out what code was not hit.

jraph · 5 months ago

I did exactly the same thing in our Confluence to XWiki migrator to easily and automatically report which macro parameters we don't handle when converting Confluence macros to equivalent macros in XWiki.

This can be used to evaluate the migration quality and spot what can be improved.

https://github.com/xwiki-contrib/confluence/blob/7a95bf96787...

golly_ned · 5 months ago

I have a similar use case and this idea also occurred to me.

However: the dict in this case would also include dataclasses, and I’d be interested in finding what exact attributes within those dataclasses were accessed, and also be able to mark all attributes in those dataclasses as accessed if the parent dataclasses is accessed, and with those dataclasses, being config objects, being able to do the same to its own children, so that the topmost dictionary has a tree of all accessed keys.

I couldn’t figure out how to do that, but welcome to ideas.

ok123456 · 5 months ago

If you're inheriting from dict to extend its behavior, there are a lot of side effects with that, and it's recommended to use https://docs.python.org/3/library/collections.html#collectio... instead.

From right above where you linked to:

> The need for this class has been partially supplanted by the ability to subclass directly from dict; however, this class can be easier to work with because the underlying dictionary is accessible as an attribute.

Sounds like (unless you need the dict as a separate data member) this class is a historical artefact. Unless there's some other issue you know of not mentioned in the documentation?

dict doesn't follow the usual object protocol, and overloaded methods are runtime dependent. It's only guaranteed that non-overloaded methods are resolved least surprisingly.

mont_tag · 5 months ago

No, that is not the recommendation. People routinely and reliably inherit from dict.

The UserDict class is mostly defunct and is only still in the standard library because there were a few existing uses that were hard to replace (such as avoiding base class conflicts in multiple inheritance).

9dev · 5 months ago

Ah, Python. The language where nobody agrees on the right way to do things, ans just does their own instead. Five ways to describe an object of a certain shape? Six package managers, with incompatible but overlapping ways to publish packages, but half of them without a simple way to update dependencies? Asynchronous versions of everything? Metaprogramming that makes Ruby blush? Yes! All of it! Lovely.

smcin · 5 months ago

UserDict is not formally deprecated but it will be someday, so code that relies on it is not future-proof.

westurner · 5 months ago

simon04 · 5 months ago

Very useful. For configparser.ConfigParser I've found https://stackoverflow.com/a/57307141