As noted in the PEP data classes is a less fully-featured stdlib implementation of what attrs already provides. Unless you’re constrained to the stdlib (as those who write CPython itself of course are) you should consider taking a look at attrs first.
This is spot on. The design of attrs remines me a little bit of the syntax from a declarative ORM, for example. I'm sure it can do very powerful things that I've not had occasion to use, but it is heavy. The @dataclass format is very clean and seems more like the syntactic sugar that I expect from Python.
One of the prime uses of a dataclass is to be a mutable namedtuple. And the syntax can be almost identical:
Part = make_dataclass('Part', ['part_num', 'description', 'quantity'])
(from Raymond Hettinger's twitter)
This has the added benefit of not requiring type hinting, if you don't want to bother with such things.
attrs also has a feature that dataclasses don't currently [0]: an easy way to use __slots__ [1].
It cuts down on the per-instance memory overhead, for cases where you're creating a ton of these objects. It can be useful even when not memory-constrained, because it will throw AttributeError, rather than succeeding silently, if you make a typo when assigning to an object attribute.
Raymond Hettinger had a pretty good presentation on Data Classes and how they relate to things like named tuples and a few recipes/patterns. It was linked on Reddit[0] but it looks like the video has been removed from YouTube. His slides are online[1], though.
I love using attrs, like the idea of bringing something similar to the standard library, but strongly disagree with the dataclasses API. It treats untyped Python as a second class citizen.
This is what I'd prefer
from dataclasses import dataclass, field
@dataclass
class MyClass:
x = field()
but it produces an error because fields need to be declared with a type annotation. This is the GvR recommended way to get around it:
@dataclass
class MyClass:
x: object
You could use the typing.Any type instead of object, but then you need to import a whole typing library to use untyped dataclasses. I highly prefer the former code block.
Yeah, it seems strange to force people to use type hints when it has had such a mixed reception. I really tried to use type hints with a new project a few months ago, but ended up stripping it all out again because it's just so damn ugly. I wish it were possible to fully define type hints in a separate file for linters, and not mix it in with production code. It's kind of possible to do it, but not fully [1], and mixing type hints inline and in separate files is in my opinion even worse than one or the other.
It's great that we have simple/clean declarations for NamedTuples an (Data)classes now. But I wonder why they chose two different styles for creating them. This for NamedTuples:
from typing import NamedTuple
class Foo(NamedTuple):
bar: str
baz: int
The short answer is that the only way to do what dataclasses do as a base class is via python metaclasses, and you can only have one metaclass. So this way, you can dataclassify something that inherits from a metaclass.
I'm happy to see data classes. I think something like this exists in 3.6:
class Person(typing.NamedTuple):
name: str
age: int
But I don't think it supports the __post_init__; however, constructors have no business doing parsing like this anyway, so unless I'm missing something, deriving from `typing.NamedTuple` seems strictly better than `@dataclass` insofar as it seems less likely to be abused.
Coming from C++ it feels really weird that you can simply assign instance.new_name = value from anywhere without properly declaring it beforehand. You also never really know what you get or if somebody modified your instance members from the outside.
I can only imagine how weird it must seem that you can override methods of instance objects and even classes, or even replace a whole class of an instance with another.
>>> class Foo:
... def bar(self):
... print('foo')
...
>>> class Baz:
... def bar(self):
... print('baz')
...
>>> f = Foo()
>>> f
<__main__.Foo object at 0x7fa311e7a278>
>>> f.bar()
foo
>>> f.__class__ = Baz
>>> f
<__main__.Baz object at 0x7fa311e7a278>
>>> f.bar()
baz
Does that work even if the types had fields? What about it the fields had a different total size? What if Baz had no parameterless constructor (I.e only had a contractor that guaranteed arg > 0 for example)?
Is this like an unsafe pointer cast where “you are responsible, and it will likely blow up spectacularly if you don’t know what you are doing” or is it something safer that will magically work e.g with types of different size?
JS & PHP let you do this as well. One advantage is that you don't have to adhere to a rigid class structure and be forced to refactor or create a new class every-time you need add a new property or method. And sometimes you want a property/method for just that particular instance, and not all members.
> One advantage is that you don't have to adhere to a rigid class structure and be forced to refactor or create a new class every-time you need add a new property or method.
I wouldn't qualify this as an advantage; it encourages bad code and it precludes a lot of good tooling (including tooling which would automate the sort of refactoring you'd like to avoid).
Doesn't happen maliciously in practice, also can be very handy when you need to attach a little extra data for the ride. If you need extra assurance there are techniques to make the instance "very" read-only.
If you run a linter, the cases where you are doing this outside of __init__ will usually be pointed out. You can silence the warning/error on a case by case basis if you really need to do it.
It’s not too bad I think, it’s just an evolution really. You can probably grok the basics of the type annotations in a short sit down. I can’t even remember when decorators were introduced but that even more greatly changed how python was written. I’ve been using python since 1.6 and I always thought the amount of repetition was ridiculous. I bet I’m not the only one that has written a “dsl” of what attrs and this pep does 1000 times using the facilities python had at the time: metaclasses, then decorators. Of course all these implementations were rushed, half assed and barely production quality. Despite any warts attrs is a pleasure to use. Type annotations boost IntelliJ/pycharm already quite clever assistance. One lingering thing is attrs named_attrs that while syntactically the best approach in my mind doesn’t work well with IntelliJ. So hopefully this will address it.
It's relatively recent. IMHO Python 3.5 to 3.7 feel like the language is going in a different direction than it did before -- type hints and the handling of asynchrony in particular.
After seeing the huge improvements that JavaScript has gone through over the years I'm all for language updates. Same with Java and C++ (although not as much for Java and I don't know C++ but I always hear C++11 is "new").
Python has grown a lot since then. Back then it was this "better scripting language" that every Linux user kinda knew. Now it's being used much more widely and that just wouldn't cut it any more.
http://www.attrs.org/en/stable/
One of the prime uses of a dataclass is to be a mutable namedtuple. And the syntax can be almost identical:
(from Raymond Hettinger's twitter)This has the added benefit of not requiring type hinting, if you don't want to bother with such things.
It cuts down on the per-instance memory overhead, for cases where you're creating a ton of these objects. It can be useful even when not memory-constrained, because it will throw AttributeError, rather than succeeding silently, if you make a typo when assigning to an object attribute.
0: https://www.python.org/dev/peps/pep-0557/#support-for-automa...
1: http://www.attrs.org/en/stable/examples.html#slots
One thing the stdlib implementation has going for it: better naming. attr.ib() is not exactly crystal-clear.
[0] https://www.reddit.com/r/Python/comments/7tnbny/raymond_hett...
[1] https://twitter.com/i/web/status/959358630377091072
This is what I'd prefer
but it produces an error because fields need to be declared with a type annotation. This is the GvR recommended way to get around it: You could use the typing.Any type instead of object, but then you need to import a whole typing library to use untyped dataclasses. I highly prefer the former code block.There's a big thread discussing the issue on python-dev somewhere. Also some discussion in https://github.com/ericvsmith/dataclasses/issues/2#issuecomm...
Anyway, it's not a huge issue—attrs is great and there's no reason not to use it instead for untyped Python.
[1] https://stackoverflow.com/questions/47350570/is-it-possible-...
Deleted Comment
Is this like an unsafe pointer cast where “you are responsible, and it will likely blow up spectacularly if you don’t know what you are doing” or is it something safer that will magically work e.g with types of different size?
As with most things, there are trade-offs.
I wouldn't qualify this as an advantage; it encourages bad code and it precludes a lot of good tooling (including tooling which would automate the sort of refactoring you'd like to avoid).
C++11 though is quite a bit different. Probably the biggest change being that raw pointers * for the most part should not be used anymore.