I inherited a python based build system which had been written without any unit tests because....it's a build system right? right?? Anyhow it was the kind of thing where you'd build for 20 minutes (Android) and then have some kind of trivial error in the python code at the end of the build. You fix that and run again and waste another 20 minutes on the next thing.
I couldn't do unit tests all at once because of the design not having any way of accommodating them. I needed something to stop the enormous waste of time.
So I added type hints. The IDE should show me when illogical things were being done with parameters to methods/functions. It was fairly quick compared to a total refactoring but not effort free. I barely noticed any effect. Didn't catch a single error. Eventually I created a kind of dummy version of Android that "built" in a few seconds and I tested against that first. That allowed me to speed up changes and get some refactoring done to make a few critical unit tests and the whole thing started to get under control.
This anecdote has almost no meaning - you cannot conclude that type hints have no benefit because of one case - I just think that tests are almost always more important and hinting and the whole rigmarole of strong typing are much less of a panacea than tests are.
Again, purely anecdotally, I've done similar things in both Typescript and Python, and I usually catch at least one silly type error in Typescript, but I rarely do in Python. That could be because the Python developers I was following were just so good, but I suspect it's more about the quality of the typecheckers. Typescript feels much better at finding errors in normal, idiomatic Javascript, whereas I feel that with Mypy, if I want the best results, I need to write code in a way that plays to its strengths.
I've been told that Pyright is better but I've not tried it out properly. But yeah, your experience largely matches with mine, for Python at least.
The issue that I have with Python type hints is they they don't go nearly far enough in describing the data being manipulated. Specifically, I'm thinking of stuff like the dimensionality and cardinality of Numpy arrays or Pandas frames. Usually that's the stuff where I have most questions when I look at Python code and the type system as it's being used now offers no help there.
You very quickly get undecidable type checking with such a powerful typesystem. Then you need a way to handle that, usually the solution is to help the type checker along by providing a proof that your code inhabits the claimed type. Then you need a way to have the proof live together with the code, a proof language, and preferably a whole library of proofs people can build on.
If this sounds fun then you can go play with e.g idris or F*.
I'm not sure how a python annotation/type system could possibly do that? If numpy/pandas had different types for different cardinalities it would work today.
You just need those libraries to embrace it really, then you could theoretically have type constructors that provide well-typed NxM matrix types or whatever, allowing you to enforce that [[1,2],[3,4]] is an instance of matrix_t(2, 2).
I don't see how python could possibly make such inferences for arbitrary libraries.
PEP 646 Variadic generics, https://peps.python.org/pep-0646/, was made for this specific use case but mypy is still working on implementing it. And even with it, it's expected several more peps are needed to make operations on variadic types powerful enough to handle common array operations. numpy/tensorflow/etc do broadcasting a lot and that probably would need a type level operator Broadcast just to encode that. I also expect the type definitions for numpy will go fairly complex similar to template heavy C++ code after they add shape types.
I like typing that with strings like '(batch,r,a,s,channel,t)'. The tooling doesn't do anything special with it, but it makes the code understandable at a glance. Adopting libraries like einops and core routines like einsum in lieu of equivalent alternatives encourages the propagation of names (rather than rolling axes or whatever) anywhere it matters. Having a coding convention about the standard order of axes helps a bit as people get more familiar with that aspect of the codebase too, only deviating where necessary.
I like this comment because so far I don't know what you're talking about. That's the hallmark of something in this domain worth looking up but I figured you might be willing to share more on the matter. :)
Python can actually do (some) dependent types with generics but it's not pretty.
The only real use-case that is both possible and worthwhile I've found is being able to say a value is a T if there's a default and an Optional[T] otherwise.
The trouble is that's not how any of the ML or data science Python code is written at the moment. Such practices could help although I think more elegant solutions should be explored.
Dataclasses is notable because it's the only example (I'm aware of) of type hints effecting runtime behavior as part of the stdlib.
Compiler instructions: mypyc was (one of?) the first to do this, but Cython actually supports this natively now, and is much more active than mypyc is last I checked.
> To add overloaded implementations to the function, use the register() attribute of the generic function, which can be used as a decorator. For functions annotated with types, the decorator will infer the type of the first argument automatically:
The idea with type hints in Python though is that they’re meant to be checked using some static analysis tool like mypy/pyright/etc. The runtime behavior for the most part remains unchanged in the sense that the Python interpreter won’t enforce the types in cases such as the one you’ve provided.
I'm a heavy user of type hints and enable pyright and mypy's strict modes whenever possible. However, you can't always be strict: if you use almost any package in the data science/ML ecosystem, you're unlikely to get good type inference and checking[1]. In those cases, it can still be useful to type some parameters and return values to benefit from _some_ checking, even if you don't have 100% coverage.
Type hints also bring improved completion, which is nice too.
[1] For example, huggingface's transformers library decided to drop support for full type checking because it was unsustainable but decided to keep the types for documentation[2]. There are stubs for pandas, but they're not enough because pandas has a tendency to change return types based on the input, and that breaks quickly.
This sort of thing is why I gave up Python. I could see having strong typing. Or optional strong typing. But unchecked type hints are just silly.
The way everybody else seems to be going is strong typing at function interfaces, with automatic inference of as much else as can be done easily. C++ (since "auto"), Go, Rust, etc.
> The way everybody else seems to be going is strong typing at function interfaces, with automatic inference of as much else as can be done easily
Both mypy and pyright will do that. If your function return type is annotated, they will infer the type of the receiving variable. If you have two branches where a variable can receive two types, pyright will infer the union type. Similar for None.
Example:
a = input()
if a.isdigit():
x = int(a)
else:
x = a
reveal_type(a)
reveal_type(x)
Pyright output, stripped of configuration noise:
typetest.py:6:13 - information: Type of "a" is "str"
typetest.py:6:13 - information: Type of "x" is "int | str"
Mypy doesn't allow this. It infers `a` as `int` and rejects the second assignment.
The only times I need to annotate local variables are (1) the function isn't typed, so it gets inferred as Any (2) I'm initialising an empty collection, so its type might get inferred as e.g. `list[Unknown]` (pyright; mypy can infer the element type).
Is there something inference-wise that you miss in Python compared to C++ or Go?
PS: The larger problem to me is the inconsistence between pyright and mypy, the leading type-checkers. Sometimes issues are raised between them and they work to achieve agreement, but I believe the two issues highlighted above (unions and collections) are design choices, unlikely to change.
This still makes me seethe. We have pip, poetry, conda, and more. The Python folks knew that multiple incompatible systems would arise from a grammar spec without a behavior spec. And here we are. Python doesn't do anything useful with the types, but third-parties are left to their own devices.
> Now, take that POC and make it production ready, by using mypy and pydantic.
Than watch it exploding in production because your "type system" is incomplete and unsound.
In my opinion an unsound static type-system is worse than no static type-system at all. In both cases you need to check everything manually. But without such pseudo type-checking you at least don't get lulled into a false sense of security.
Not to start a religious war, but I think Ruby screwed-up on gradual typing making it too complex and too many steps. Attempting to maintain perfect Microsoft-legacy-style compatibility rather than have a hard change is the greater failure than having a truly new major version rather than arbitrary marketing increments.
Crystal is compiled with static typing but looks like Ruby. The type specification it uses emulates gradual typing of dynamic languages.
How does Ruby’s system look? Python’s type hints are fully optional and opt-in, typed and non-typed code works the same (even though type checkers may complain about the latter).
I believe the Ruby designers have refused to add syntax for type hints, so they need to either be in comments on separate lines from the code itself or even in a separate file. They are therefore less ergonomic to use - but the increased separation from the code means that they are mostly used purely as (machine-verifiable) comments. On the other hand Python's type hints tend to be deeply intertwined with the code, and are even required to access new language features such as dataclasses.
The two languages take such different approaches because their designers have different feelings about static typing. Guido and the Steering Council seem to want Python to be as statically-typed as possible, whereas Matz thinks "static type declaration is redundant" [0].
Some time ago I made a dependency injector[0] in Python using type hints. I have always enjoyed playing with type systems and I wanted to explore Python's.
I remember that it felt rough. I had issues specially with funcions using veriadic types in generics. I also remember having issues with overloading a function: sometimes it would go for the more generic one, instead of going for the more specific one when inferring types.
I managed to solve all of that. Unfortunately, that happened some time ago and I don't remember the specifics, only that it was a fun project to develop. I use it frequently in other projects.
If you accept my -in advance- apology, why this type thing seem ugly -distraction,complex, hard to act, ...- to me ?
I'm not against it , as in TS (non enforced type checking ), it is a lovely addition to python, but I'm really struggling to read, write and .. this syntax .
Not sure if it is python's nature, but as most of us, C, C++, ...., TS this is a journey, evolution , but the root C is some kind of cult we attached to. Is it preventing me to love this thing ( thanks for effort )
Additions to language always confront syntax expectations. Some people can see through syntax to semantic intent, alas I am not one and the utility of a syntactic form goes (to me at least) beyond expressiveness to comprehension: if your syntax confuses then how can anyone comprehend?
I couldn't do unit tests all at once because of the design not having any way of accommodating them. I needed something to stop the enormous waste of time.
So I added type hints. The IDE should show me when illogical things were being done with parameters to methods/functions. It was fairly quick compared to a total refactoring but not effort free. I barely noticed any effect. Didn't catch a single error. Eventually I created a kind of dummy version of Android that "built" in a few seconds and I tested against that first. That allowed me to speed up changes and get some refactoring done to make a few critical unit tests and the whole thing started to get under control.
This anecdote has almost no meaning - you cannot conclude that type hints have no benefit because of one case - I just think that tests are almost always more important and hinting and the whole rigmarole of strong typing are much less of a panacea than tests are.
I've been told that Pyright is better but I've not tried it out properly. But yeah, your experience largely matches with mine, for Python at least.
If this sounds fun then you can go play with e.g idris or F*.
You just need those libraries to embrace it really, then you could theoretically have type constructors that provide well-typed NxM matrix types or whatever, allowing you to enforce that [[1,2],[3,4]] is an instance of matrix_t(2, 2).
I don't see how python could possibly make such inferences for arbitrary libraries.
The only real use-case that is both possible and worthwhile I've found is being able to say a value is a T if there's a default and an Optional[T] otherwise.
Runtime behaviour determination: the stdlib [dataclasses](https://docs.python.org/3/library/dataclasses.html#module-da...)
Dataclasses is notable because it's the only example (I'm aware of) of type hints effecting runtime behavior as part of the stdlib.
Compiler instructions: mypyc was (one of?) the first to do this, but Cython actually supports this natively now, and is much more active than mypyc is last I checked.
https://fastapi.tiangolo.com/release-notes/#0950
FWIW, `typing.NamedTuple` did this in Python 3.5, three years before dataclasses was introduced in 3.7.
> To add overloaded implementations to the function, use the register() attribute of the generic function, which can be used as a decorator. For functions annotated with types, the decorator will infer the type of the first argument automatically:
That appears to be the only other case.
Type hints also bring improved completion, which is nice too.
[1] For example, huggingface's transformers library decided to drop support for full type checking because it was unsustainable but decided to keep the types for documentation[2]. There are stubs for pandas, but they're not enough because pandas has a tendency to change return types based on the input, and that breaks quickly.
[2] https://github.com/huggingface/transformers/pull/18485
A mechanism like Haskell's type application seems like it could solve at least most, maybe all of those problems.
0: https://hackage.haskell.org/package/Frames
The way everybody else seems to be going is strong typing at function interfaces, with automatic inference of as much else as can be done easily. C++ (since "auto"), Go, Rust, etc.
Both mypy and pyright will do that. If your function return type is annotated, they will infer the type of the receiving variable. If you have two branches where a variable can receive two types, pyright will infer the union type. Similar for None.
Example:
Pyright output, stripped of configuration noise: Mypy doesn't allow this. It infers `a` as `int` and rejects the second assignment.The only times I need to annotate local variables are (1) the function isn't typed, so it gets inferred as Any (2) I'm initialising an empty collection, so its type might get inferred as e.g. `list[Unknown]` (pyright; mypy can infer the element type).
Is there something inference-wise that you miss in Python compared to C++ or Go?
PS: The larger problem to me is the inconsistence between pyright and mypy, the leading type-checkers. Sometimes issues are raised between them and they work to achieve agreement, but I believe the two issues highlighted above (unions and collections) are design choices, unlikely to change.
This still makes me seethe. We have pip, poetry, conda, and more. The Python folks knew that multiple incompatible systems would arise from a grammar spec without a behavior spec. And here we are. Python doesn't do anything useful with the types, but third-parties are left to their own devices.
Mock up something fast, no type hints.
Now, take that POC and make it production ready, by using mypy and pydantic.
Than watch it exploding in production because your "type system" is incomplete and unsound.
In my opinion an unsound static type-system is worse than no static type-system at all. In both cases you need to check everything manually. But without such pseudo type-checking you at least don't get lulled into a false sense of security.
Crystal is compiled with static typing but looks like Ruby. The type specification it uses emulates gradual typing of dynamic languages.
The two languages take such different approaches because their designers have different feelings about static typing. Guido and the Steering Council seem to want Python to be as statically-typed as possible, whereas Matz thinks "static type declaration is redundant" [0].
[0] https://evrone.com/yukihiro-matsumoto-interview
I remember that it felt rough. I had issues specially with funcions using veriadic types in generics. I also remember having issues with overloading a function: sometimes it would go for the more generic one, instead of going for the more specific one when inferring types.
I managed to solve all of that. Unfortunately, that happened some time ago and I don't remember the specifics, only that it was a fun project to develop. I use it frequently in other projects.
[0]: https://gitlab.com/applipy/applipy_inject
Do not know?
Haskell dies in syntax.