Interestingly, most of these advantages are not specific to static typing, but derive from having a language that talks about types – even a dynamic one. For example, most of these advantages apply to Julia as well, a dynamic language that has type declarations, which:
- Lets you create typed collections so that if you insert the wrong kind of value, you get an error immediately, albeit only when code runs, not before;
- Has abstract types that serve the same function as Haskell's type classes, and allow you to inherit a huge amount of functionality for a very brief type definition;
- Serve as documentation of APIs;
- Allow performance similar to C/C++.
Another advantage that wasn't listed is that in languages that don't allow type annotations, you end up writing a lot of boilerplate type-checking code in libraries to give better error messages.
I didn't claim it was. That's one of the tradeoffs you make in exchange for a simpler, more forgiving programming model (and the ability to do serious work in a REPL or notebook). However, given that Julia programs can be mostly type inferred before execution, it is possible to check for this kind of error before running a program. There have been some projects to do exactly that [1] [2], and I suspect that once Julia reaches 1.0, we'll focus more effort on static analysis tools.
The point is that you don't need a static language to get many of the benefits of types, you just need your language that allows you to express type-based properties. One of the way to do that is to have a set of formal rules for deriving the type of every expression in a program, but that's not actually necessary.
These are exactly some of the advantages Perl6 has over Perl5. In Perl6 you do have types but they're optional. So you can be as precise or as imprecise (typed) as you like.
In D for instance you do have an auto type, yet it's still a type. An auto function that returns real or BigInt will trigger a compiler error. You need to convert the real to BigInt before any compilation can occur.
In my opinion there is a huge difference between I get an error in my compiler, and my user has a hard to reproduce error in a month when they are in the middle of a business critical task.
It's frustrating that this static vs dynamic 'battle' still goes on. The longer everyone thinks this is actually a problem, the longer we have to wait for innovations to happen. Look at the web in 2016, the technology is a complete disarray. Look at the game industry in 2016, where C++ is thrown around as the the cause and solution to all life's problems. A language created 33 years ago now with no intention of being used as it is today.
The advantage of static typing is obvious to me; the more my computer understands of my code, the less work I have to do. I can offload my memory and instead think about things that are more interesting, not be looking up Python obscurities on Stackoverflow.
But static typing is not perfect, and there is still more the computer could be doing if it had more understanding. That is where we should be focusing, that is where the real problem is.
> Look at the web in 2016, the technology is a complete disarray
So that's pretty much been that way since the Java plugin for Netscape Navigator. That started the plugin battle which morphed into the mess we're in today.
> Look at the game industry in 2016, where C++ is thrown around as the the cause and solution to all life's problems.
Software written in C/C++ often have LUA/Python/Perl plugins to add scriptability.
> The advantage of static typing is obvious to me; the more my computer understands of my code, the less work I have to do.
Nope. Computers don't understand code, they just processes it and point out where your errors are, there's very little understanding. Type checking is really just rattling off a checklist to validate the type being passed around is the type that's expected.
Regarding your last sentence, part of Alan Kay's motivation for OOP with SmallTalk was that he filt that type systems were limiting because they never anticipate all the possible types a program will need.
With static typing the computer doesn't "understand" your code. It just automatically checks the correctness proof you've presented to it. And the correctness is limited only to certain properties of the program, that a given type system supports.
> And the correctness is limited only to certain properties of the program, that a given type system supports.
In case there are readers not familiar with the current state of things, you can pick solid languages today whose type systems support a LOT. It is completely fair (and good) to point out that this isn't a panacea, but it can significantly help.
We can argue about the definition of 'understand' all day, but it's a simple fact that the computer has more of an understanding of the structure of your program with a type system, and thus can help out more.
>The advantage of static typing is obvious to me; the more my computer understands of my code, the less work I have to do.
There's nothing about dynamic typing which precludes this.
There are also things which it is not necessary for your computer to understand but the rigidities of an excessively strict type system will demand it be told those things anyway.
It's important to remember that the terms "statically typed" and "dynamically typed" cover a huge range of different language features. To have a good discussion about the tradeoffs, it's better to talk about individual features rather than "statically typed" or "dynamically typed".
For example,
Like puzzle pieces with shapes that we can observe fit together, we can think of types as specifying a grammar for programs that ‘make sense’.
All programming languages have a grammar for programs that make sense (it's specified by the parser). Algebraic data types give you a grammar for values that make sense. Separately, a type checker assigns types to expressions and checks that they're consistent. You can have algebraic data types without a type checker (e.g. Racket's 2htdp/abstraction), and you can have a type checker without algebraic data types.
You can also have constraints on values that are too complex to be easily checked at compile time (e.g. clojure.spec) or that can be checked at compile time but only incompletely (e.g. Erlang's -type and Dialyzer).
Anyway, I don't mean to be critical of the article: it does a good job covering the high-level tradeoffs. I just think people are too quick to generalize their experience with particular languages to entire classes of language features.
I've recently joined a team writing primarily in Clojure, whose proponents often tout repl-driven programming as a unique benefit to the language.
In a statically typed language (the stronger the better), aided by a good IDE, I don't need to be constantly executing my code against data during development; My editor is constantly validating my code, and when it stops complaining, my code will work. And months later when I or someone else uses that code in another part of the system, they won't need to execute that code to see how it behaves, as the types themselves provide documentation and as-you-code feedback.
Let's say, "there is high probability that my code will work correctly right away". (This is especially true for Haskell.)
[Edit] To be fair, this is not only, and even perhaps not so much, due to the static typing per se but also due to the mental discipline the particular programming language may require from the programmer even to write code that can be successfully compiled.
I would say 'a statically typed language aided by a type inference engine'. My biggest bugbears with statically typed languages go away when decent type inference comes into play.
You could argue that with PowerShell you have an entire .NET REPL. And I have actually used it that way, particularly when I run into some function that has bad MSDN documentation.
I started programming with PHP and JavaScript (both dynamically typed) then I started writing games with ActionScript 2 (dynamically typed) then I switched to ActionScript 3 (statically typed), then I got into Java (statically typed) and later C/C++ (statically typed) - So I spent a lot of time with both.
For the past few years I've been coding almost exclusively in dynamically typed languages - I did some Python (dynamically typed) but mostly a lot of JavaScript/Node.js (dynamically typed). I understand all the pros and cons, but for me personally, I am much more productive with dynamically typed languages than statically typed ones.
It was a long time ago, but I still remember clearly when I switched from AS2 to AS3 - I was writing games for Flash at the time; I did feel an improvement and I really liked the additional structure which types brought to my code. There was a certain satisfaction that came with defining fixed classes and interfaces and making use of polymorphism and various formal 'design patterns'. It gave me extra 'confidence' in my code.
In retrospect, after having spent years praising statically-typed languages, and then later switching back to dynamically typed languages, I think a lot of the benefits that I felt during my static typing phase came down to one simple fact:
"Statically typed languages force you think more before you do things" - This was really valuable early in my career when I had a tendency to rush things. However, now that I more fully appreciate how complex programming is (and how easy it is to break stuff), I am always very careful (regardless of the language).
Static typing for me has become a tedious process through which I no longer derive much value - Though it was really useful at a specific point in my career.
That said, I think there is some stuff (anything to do with low-level hardware/systems and optimizations) where statically typed languages cannot be avoided.
Also I wouldn't say that people who like statically typed languages are inexperienced - I know some very experienced engineers who are just addicted to that extra feeling of 'confidence' and structure which statically typed languages give you.
Forgive me, because I only have what you have said to work with. It sounds like this was a lot of smaller projects with smaller teams? If so, static typing increases value as project size and head count increases.
The more pieces of the project that you didn't write or otherwise have low knowledge of their inner workings, the more dangerous modifying code becomes. It's very useful to have something telling you that everything looks OK. That something can be testing, but why rely on writing good tests when we have formal proof systems available?
"This is sort of a subtle point: when programming in a static language, you always have a choice about what information you encode in the types, and how you encode things. ... In a static language, you do always have the option of building a less typeful API, where less is enforced by the types, but it is often tempting to spend more time encoding things statically (and then proving things to the typechecker) than would be saved by avoidance of potential future bugs. With experience, you develop a good sense for what is worth tracking statically and what to keep dynamic, but newcomers to static languages can make bad tradeoffs here, which in turn contributes to needless complexity in the language’s library ecosystem. "
This is a very important point that I've rarely seen talked about explicitly though I think all programmers develop a tacit sense of it.
It's an important point, but it can also be fairly well generalized to a lot of the "softer" points of program design. For instance, when to introduce an abstraction layer, and where the boundaries of the abstraction are. You could find-and-replace instances of "static typing" with "abstraction layer" into the quoted paragraph and it still makes perfect sense.
I think the concept of refactoring in dynamically typed vs statically typed languages is a double-edged sword comparison.
On one hand, refactoring code written in a statically typed languages is less error-prone - But on the other hand, such refactorings tend to affect more code than those of dynamically-typed code.
If your business requirements change often and refactorings are common, it can be a pain to keep having to rethink your code structure.
Dynamically-typed code is often easier to extend and modify. I find that with statically typed code, if you start messing with a small part of your code, sometimes you have to rethink your entire class hierarchy. With dynamic languages, your code can handle quite a few changes before it gets to a point were you need to rethink the overall structure.
The author asserts that static typing allows the compiler to answer this question, but this only allows the compiler to spot type errors in advance. There are many other kinds of errors that are completely invisible to the compiler.
In a dynamically typed language, if I don't spot the error from reading the code, I must wait until runtime/testing to discover the error. This is also true for a statically typed language for every kind of error except type errors. Personally, type errors haven't been the kind of errors that haunt my dreams. I guess that's why I'm not enthusiastic about static typing.
You are right - in a tautalogical way - that type systems only catch type errors. However, in modern languages (including Haskell, Scala, as well as newer, more experimental languages like Idris), those type errors can be extremely powerful.
Many people assume that 'types' are simply primitives like Int and String, and that a type checker just makes sure you don't pass an Int to a function expecting String. However, it is possible to express far more powerful statements about your data using a good type system.
For example, you can express the idea of non-emptiness of a container, as mentioned in the article. Then you know that, say, taking the max element of a non-empty container is guaranteed to give you an element, whereas with a possibly-empty container you might not have any element at all, causing a null, or exception, or at least requiring an Optional type.
You can express safety properties such as a sanitized string vs. unsanitized. You can have a Sanitized type that can only be created by calling a sanitize function - which carefully escapes/handles any invalid characters - and then functions that might, say, pass a value into an SQL instruction can be typed to only take Sanitized strings. Now the representation in memory of Strings and Sanitized strings is identical, but by using different types and a certain set of allowed functions on those types, you can encode the invariant that a string cannot be inserted into an SQL query until it has been sanitized. Now your type checker can catch SQL insertion vulnerabilities for you. How's that for a type error?
Yes, this is the point that's often missing in discussions about types. You can (and you have to work to) encode properties as types to get more value out of them. It's not about avoid mixing ints and strings.
First, when most people talk about static typing, they're talking about the near-useless version -- just types like Int and String. I think we agree there, so I won't mention it further.
Second, a dynamically typed language like Python has more typing information than some folks first assume. Python's AttributeError is quite similar to a TypeError. In fact, with old-style classes (v2.1 and earlier), many errors that are now TypeErrors were AttributeErrors. Calling len() on an inappropriate object would raise "AttributeError: no __len__". In many cases where folks talk about wanting a static type system, they really just want interfaces.
The Sanitized string example is a good counter-point because the interface needs to be near-identical to a regular string. I'm not certain a more complex memory representation (caused by defining a different class) would cause noticeable inefficiency. We're probably not doing vectorized operations on strings.
This brings me to my third point, that Python 3 has a similar split between two types: bytes and str. The memory representation is slightly different, bytes vs unicode, but the interfaces are nearly identical. Two differences would be decode vs encode and that getting an element from bytes (annoyingly) gives an int. The distinction between the two types is enforced mostly inside builtin functions, implemented in C. This was a big deal, causing backwards incompatibility, many flamewars, and we're still resolving it, though I think it's clear to most people now that Python 3 is the future.
Is it possible that the Python 2/3 split could have been avoided if we had a static type system? Perhaps, if we had multiple dispatch, the function signatures could have remained the same, avoiding backwards incompatibility... I'm just speculating here. My guess is no, getting rigorous about unicode would cause incompatibility regardless of the type system. I'll get back to the main topic now.
> Now your type checker can catch SQL insertion vulnerabilities for you.
This sounds useful, but a good interface solves the problem just as well. I'm a Pythonista (if you haven't noticed), so my example is PEP 249 that specifies a DB API for all database wrapper implementers to follow. It states that it's the wrapper dev's responsibility to implement a sanitizing string interpolation for the cursor's execute method.
My conclusion is that designing a good interface is important whether you have dynamic or static typing. Static typing errs on the side of safety, dynamic typing errs on the side of flexibility. Both can mimic the other. Arguing that one is better is like saying linear regression is better/worse than k-nearest-neighbors.
> Personally, type errors haven't been the kind of errors that haunt my dreams.
The point of strongly-typed systems is that you can represent your constraints as types. This takes extra thinking and work, but gives you almost almost unlimited expressive power (ref: agda).
Simple example: meters and feet as different numerical types. When you multiply them, you get a silly unit (foot-meters) that doesn't fit with whatever you wanted (meters^2), and thusly fails compilation.
I haven't met anyone other than Haskellians that would create separate types for meters and feet.
I also wonder when you would make that distinction in the lifecycle of your application. I suspect not until you first encounter the bug of accidentally mixing units. If so, we'd be solving the problem at the same time, just using different techniques.
"Type errors" go way further than "Damn, I passed a string in where I expected an integer". Types are a way of expressing aspects of your code. You can avoid race conditions, you can ensure a program's state is always expected, you can avoid race conditions, you can avoid design errors by encoding the contracts of your design into your types.
I think if you're used to a language like Java or C++ you may not see what types can really buy you, but that's because most languages have bad type systems.
Static typing is one technique for designing safe, easy to use interfaces. Depending on the language, there may be other tools that are just as effective.
The biggest issue with dynamic programming for me is refactoring code. I feel very confident when refactoring a static code-base. With a dynamic code-base it 's much more risky - to the point where I avoid.
Of course having great test coverage helps alleviate this, but it's very rare where a large project has 100% test coverage.
- Lets you create typed collections so that if you insert the wrong kind of value, you get an error immediately, albeit only when code runs, not before;
- Has abstract types that serve the same function as Haskell's type classes, and allow you to inherit a huge amount of functionality for a very brief type definition;
- Serve as documentation of APIs;
- Allow performance similar to C/C++.
Another advantage that wasn't listed is that in languages that don't allow type annotations, you end up writing a lot of boilerplate type-checking code in libraries to give better error messages.
The point is that you don't need a static language to get many of the benefits of types, you just need your language that allows you to express type-based properties. One of the way to do that is to have a set of formal rules for deriving the type of every expression in a program, but that's not actually necessary.
[1] https://github.com/astrieanna/TypeCheck.jl
[2] https://github.com/tonyhffong/Lint.jl
In D for instance you do have an auto type, yet it's still a type. An auto function that returns real or BigInt will trigger a compiler error. You need to convert the real to BigInt before any compilation can occur.
The advantage of static typing is obvious to me; the more my computer understands of my code, the less work I have to do. I can offload my memory and instead think about things that are more interesting, not be looking up Python obscurities on Stackoverflow.
But static typing is not perfect, and there is still more the computer could be doing if it had more understanding. That is where we should be focusing, that is where the real problem is.
So that's pretty much been that way since the Java plugin for Netscape Navigator. That started the plugin battle which morphed into the mess we're in today.
> Look at the game industry in 2016, where C++ is thrown around as the the cause and solution to all life's problems.
Software written in C/C++ often have LUA/Python/Perl plugins to add scriptability.
> The advantage of static typing is obvious to me; the more my computer understands of my code, the less work I have to do.
Nope. Computers don't understand code, they just processes it and point out where your errors are, there's very little understanding. Type checking is really just rattling off a checklist to validate the type being passed around is the type that's expected.
In case there are readers not familiar with the current state of things, you can pick solid languages today whose type systems support a LOT. It is completely fair (and good) to point out that this isn't a panacea, but it can significantly help.
You know that they update the language regularly, and have a games sig? https://groups.google.com/a/isocpp.org/forum/#!forum/sg14
It could really use profiles. Strict mode or something, where the code should only use N "good practice" features.
Dead Comment
There's nothing about dynamic typing which precludes this.
There are also things which it is not necessary for your computer to understand but the rigidities of an excessively strict type system will demand it be told those things anyway.
For example,
Like puzzle pieces with shapes that we can observe fit together, we can think of types as specifying a grammar for programs that ‘make sense’.
All programming languages have a grammar for programs that make sense (it's specified by the parser). Algebraic data types give you a grammar for values that make sense. Separately, a type checker assigns types to expressions and checks that they're consistent. You can have algebraic data types without a type checker (e.g. Racket's 2htdp/abstraction), and you can have a type checker without algebraic data types.
You can also have constraints on values that are too complex to be easily checked at compile time (e.g. clojure.spec) or that can be checked at compile time but only incompletely (e.g. Erlang's -type and Dialyzer).
Anyway, I don't mean to be critical of the article: it does a good job covering the high-level tradeoffs. I just think people are too quick to generalize their experience with particular languages to entire classes of language features.
In a statically typed language (the stronger the better), aided by a good IDE, I don't need to be constantly executing my code against data during development; My editor is constantly validating my code, and when it stops complaining, my code will work. And months later when I or someone else uses that code in another part of the system, they won't need to execute that code to see how it behaves, as the types themselves provide documentation and as-you-code feedback.
You mean your code will compile and run. Whether it behaves as desired is completely unknown without testing.
[Edit] To be fair, this is not only, and even perhaps not so much, due to the static typing per se but also due to the mental discipline the particular programming language may require from the programmer even to write code that can be successfully compiled.
For the past few years I've been coding almost exclusively in dynamically typed languages - I did some Python (dynamically typed) but mostly a lot of JavaScript/Node.js (dynamically typed). I understand all the pros and cons, but for me personally, I am much more productive with dynamically typed languages than statically typed ones.
It was a long time ago, but I still remember clearly when I switched from AS2 to AS3 - I was writing games for Flash at the time; I did feel an improvement and I really liked the additional structure which types brought to my code. There was a certain satisfaction that came with defining fixed classes and interfaces and making use of polymorphism and various formal 'design patterns'. It gave me extra 'confidence' in my code.
In retrospect, after having spent years praising statically-typed languages, and then later switching back to dynamically typed languages, I think a lot of the benefits that I felt during my static typing phase came down to one simple fact:
"Statically typed languages force you think more before you do things" - This was really valuable early in my career when I had a tendency to rush things. However, now that I more fully appreciate how complex programming is (and how easy it is to break stuff), I am always very careful (regardless of the language).
Static typing for me has become a tedious process through which I no longer derive much value - Though it was really useful at a specific point in my career.
That said, I think there is some stuff (anything to do with low-level hardware/systems and optimizations) where statically typed languages cannot be avoided.
Also I wouldn't say that people who like statically typed languages are inexperienced - I know some very experienced engineers who are just addicted to that extra feeling of 'confidence' and structure which statically typed languages give you.
The more pieces of the project that you didn't write or otherwise have low knowledge of their inner workings, the more dangerous modifying code becomes. It's very useful to have something telling you that everything looks OK. That something can be testing, but why rely on writing good tests when we have formal proof systems available?
Cant't help thinking that that's probably something like 90% of code out there...
This is a very important point that I've rarely seen talked about explicitly though I think all programmers develop a tacit sense of it.
On one hand, refactoring code written in a statically typed languages is less error-prone - But on the other hand, such refactorings tend to affect more code than those of dynamically-typed code.
If your business requirements change often and refactorings are common, it can be a pain to keep having to rethink your code structure.
Dynamically-typed code is often easier to extend and modify. I find that with statically typed code, if you start messing with a small part of your code, sometimes you have to rethink your entire class hierarchy. With dynamic languages, your code can handle quite a few changes before it gets to a point were you need to rethink the overall structure.
The author asserts that static typing allows the compiler to answer this question, but this only allows the compiler to spot type errors in advance. There are many other kinds of errors that are completely invisible to the compiler.
In a dynamically typed language, if I don't spot the error from reading the code, I must wait until runtime/testing to discover the error. This is also true for a statically typed language for every kind of error except type errors. Personally, type errors haven't been the kind of errors that haunt my dreams. I guess that's why I'm not enthusiastic about static typing.
Many people assume that 'types' are simply primitives like Int and String, and that a type checker just makes sure you don't pass an Int to a function expecting String. However, it is possible to express far more powerful statements about your data using a good type system.
For example, you can express the idea of non-emptiness of a container, as mentioned in the article. Then you know that, say, taking the max element of a non-empty container is guaranteed to give you an element, whereas with a possibly-empty container you might not have any element at all, causing a null, or exception, or at least requiring an Optional type.
You can express safety properties such as a sanitized string vs. unsanitized. You can have a Sanitized type that can only be created by calling a sanitize function - which carefully escapes/handles any invalid characters - and then functions that might, say, pass a value into an SQL instruction can be typed to only take Sanitized strings. Now the representation in memory of Strings and Sanitized strings is identical, but by using different types and a certain set of allowed functions on those types, you can encode the invariant that a string cannot be inserted into an SQL query until it has been sanitized. Now your type checker can catch SQL insertion vulnerabilities for you. How's that for a type error?
First, when most people talk about static typing, they're talking about the near-useless version -- just types like Int and String. I think we agree there, so I won't mention it further.
Second, a dynamically typed language like Python has more typing information than some folks first assume. Python's AttributeError is quite similar to a TypeError. In fact, with old-style classes (v2.1 and earlier), many errors that are now TypeErrors were AttributeErrors. Calling len() on an inappropriate object would raise "AttributeError: no __len__". In many cases where folks talk about wanting a static type system, they really just want interfaces.
The Sanitized string example is a good counter-point because the interface needs to be near-identical to a regular string. I'm not certain a more complex memory representation (caused by defining a different class) would cause noticeable inefficiency. We're probably not doing vectorized operations on strings.
This brings me to my third point, that Python 3 has a similar split between two types: bytes and str. The memory representation is slightly different, bytes vs unicode, but the interfaces are nearly identical. Two differences would be decode vs encode and that getting an element from bytes (annoyingly) gives an int. The distinction between the two types is enforced mostly inside builtin functions, implemented in C. This was a big deal, causing backwards incompatibility, many flamewars, and we're still resolving it, though I think it's clear to most people now that Python 3 is the future.
Is it possible that the Python 2/3 split could have been avoided if we had a static type system? Perhaps, if we had multiple dispatch, the function signatures could have remained the same, avoiding backwards incompatibility... I'm just speculating here. My guess is no, getting rigorous about unicode would cause incompatibility regardless of the type system. I'll get back to the main topic now.
> Now your type checker can catch SQL insertion vulnerabilities for you.
This sounds useful, but a good interface solves the problem just as well. I'm a Pythonista (if you haven't noticed), so my example is PEP 249 that specifies a DB API for all database wrapper implementers to follow. It states that it's the wrapper dev's responsibility to implement a sanitizing string interpolation for the cursor's execute method.
My conclusion is that designing a good interface is important whether you have dynamic or static typing. Static typing errs on the side of safety, dynamic typing errs on the side of flexibility. Both can mimic the other. Arguing that one is better is like saying linear regression is better/worse than k-nearest-neighbors.
The point of strongly-typed systems is that you can represent your constraints as types. This takes extra thinking and work, but gives you almost almost unlimited expressive power (ref: agda).
Simple example: meters and feet as different numerical types. When you multiply them, you get a silly unit (foot-meters) that doesn't fit with whatever you wanted (meters^2), and thusly fails compilation.
I also wonder when you would make that distinction in the lifecycle of your application. I suspect not until you first encounter the bug of accidentally mixing units. If so, we'd be solving the problem at the same time, just using different techniques.
I think if you're used to a language like Java or C++ you may not see what types can really buy you, but that's because most languages have bad type systems.
Of course having great test coverage helps alleviate this, but it's very rare where a large project has 100% test coverage.