Joe was such a creative thinker, and he was never (that I could tell) embarrassed by the prospect of an idea of his proving to be a poor one, so it was always interesting to hear him talk.
Genuinely curious, thinking outside the box (writing a new programming language on top of Prolog?), and he treated people with utmost respect. I was a nobody lucky enough to escort him around Chicago one day when he was attending a conference, and we spent a couple of hours talking about art, Erlang, Riak, and man I wish I could remember what else.
Really? And every developer will create custom, broken, non-standard "namespace" system: admin_get_user vs. readers_get_user, or maybe admin.get_user vs. readers.get_user, or maybe get_user_admin, get_user_readers, etc. Surely, this will stir a lot of creativity, but I am not sure we need that.
I've thought a lot about storing program structures in distributed hash tables and came to the conclusion that the only viable languages that can be safely stored are purely functional. If you consider OOP languages, there are many stateful dependencies, for example a method may rely on a constructor initializing a private variable in some specific way, so the smallest unit of modularity cannot be a method. Similarly an entire class could perhaps rely on methods being called in a certain order. Even though classes are designed to be self contained, stateful behavior really mucks up the ability to separate things into constituent parts.
Purely functional on the other hand has none of these problems. This is the approach taken by the Unison Language people, which I think makes the right design decisions.
It's stateful without transactions in the simplest implementation, but there's no reason you couldn't create a procedural language that maintains state and enforces transactional separation.
In either language class, you need to manage transactions, usually implicitly, if you want to do any meaningful work. This is a gap that either language class can easily solve and in many cases, there are working implementations of these ideas that do just that.
Purely functional introduces lots of issues on it's own. (Look at all the monad insanity Haskell has to do to get the equivalent of a print statement).
It's no more insane than async programming. In fact, it's less insane: it's what you'd get if you let async and await be first-class citizens in your language instead of awkward special cases.
1. You dont need a monad to do io in Haskell, you can do that directly. 2. Monad is a tool that offers a "grip" over io, ie your code still yields easily to formal proofs despite using io for example. 3. The sanity and insanity of controlled and uncontrolled io is imho the exact oposite of what you seem to imply.
I suppose it's only natural, having solved distributed systems with Erlang, Joe would move on to tackling the hardest problem in computer science - Naming Things.
There’s a spectrum of what different programming languages call “modules” and what they use them for, with languages like ML having “true” modules in the conceptual sense, where a module is a unit of abstraction, the way that classes and interfaces in many languages are units of abstraction, though when you dig into it, I believe modules are more fundamental and more powerful… then on the other hand you have the situation of, “Programs are collections of text files, people don’t want to put all their code in one file, let’s call a file a module.”
Point being, some ways that programming languages use the concept of a module are deep and let you do things you couldn’t otherwise do, some are more dispensable.
Then there’s Smalltalk where I believe programs aren’t collections of text files, you actually browse your code in a sort of code browser and it’s stored as runtime objects in a virtual machine image, basically!
Then there’s the matter of how code is released and distributed… packaging and libraries. (In practice, the words “module” and “package” are used in overlapping ways by different languages.) Are the single-function modules/packages of NPM a good thing? In practice it doesn’t seem that way.
It’s sort of analogous to trying to “giggify” all the jobs. Every individual function outsourced.
Wikipedia is probably the most prominent example of a flat namespace that works at scale. But those names are pretty long, particularly when they need to disambiguate. Also, it's an encyclopedia where article editors are forced to collaborate, and it's leveraging large vocabularies that already exist. (One for each language.)
For programming languages, even with a flat module namespace, you get a land rush where good names get taken early by packages that might end up unpopular or abandoned.
Leveraging DNS seems like the answer. Java did it badly, but Go's approach seems fine, perhaps because it leverages GitHub's namespace too (for most modules).
> Leveraging DNS seems like the answer. Java did it badly, but Go's approach seems fine, perhaps because it leverages GitHub's namespace too (for most modules).
O the contrary, Go's approach will lead to lots of problems while Java's is actually simpler and safer. The fact that you depend on the current status of DNS every time you build your code, for each and every one of your dependencies, including transitive ones, is completely nuts. Even Google realized this, and they solved it Google style: they added another automated system on top to try to add some stability (the Go proxy). And then they had to add holes to that system, because it turns out not all dependencies are public and so they can't solve this from on high (GOPRIVATE).
And still, if one of your dependencies decides to switch hosting provider for their source code, or loses their domain name, you have to make (small) changes in every code file that referenced that dependency.
Maven's solution is much simpler for everyone involved: DNS is only involved in registering a new module, it only serves as a form of authentication. After the initial registration, the module name is allocated to your Maven Central account, and it won't be revoked if you later lose that domain. If someone gets access to your domain, they don't also automatically become able to push malware to people who used your module for years, neither retroactively (which Go also handles) nor when they next upgrade (which Go will happily allow).
Been trying to make this point several times now, they always doubt that this problem has been solved, and by Maven of all places. The NIH syndrome is for some reason rampant among package managers/registries.
I'm not sure Go works that way. Are you confusing 'go get' (downloading code) with compiling code?
Maven and Gradle are part of the reason I don't use Java anymore. Java seems to have gone through multiple unfortunate build systems without settling on a good one.
I'd argue that "mental anguish" is probably less of a problem on the latter design, since the additional parenthetical material is used pure for disambiguation (and so is optional), not for categorization. So, on the latter design we can have:
/Francis Bacon
/Francis Bacon (artist)
without having to answer anguish-inducing questions that would be raised by
/Francis Bacon
/artist/Francis Bacon
e.g. questions such as:
1. "Maybe we should put Lord Bacon under a category also, maybe `philosopher/Francis Bacon`"
2. "But he's also a statesman ... what about `statesman/Francis Bacon`? Is he more of a philosopher or a statesman"?
3. What about ordinary people (who happen to be involved in historical events) that have wikipedia entries, like George Floyd? Should he be assigned something like `person/George Floyd`? If so, should the two Francis Bacons be assigned `person/philosopher/Francis Bacon` and `person/artist/Francis Bacon` instead?
In programming, it’s useful to have namespaces for overall organization and conflict resolution. But it’s a trade off: you are nudged into a hierarchical ontology, with all the implied issues.
Wikipedia titles are free text (more or less). So they can afford not to introduce hierarchical naming and still have nice, easily addressable names without conflicts.
This avoids all sorts of problems. Most things simply can’t be categorized in a strict, hierarchical manner.
Practically speaking, when you want to link to the latter in a markdown that already utilises parens, you encounter more friction - cutting and pasting a url with parens from the address bar needs manual correction if used to create a link in (say) reddit's markdown syntax.
I realise this is somewhat tangential to your point, but shows how easy it is for innocent looking choices to end up creating annoyances.
I have to listen to the Joe's talk on this but from OP this is more accurately "Why do we need Erlang modules at all?" or generously "Modules in FP languages". A key motivating pattern (fib/3) is a pattern in FP.
More directly in terms of Joe's brainstorming:
- I don't see how the versioning matter is simplified by a flat space of functions. Before you had Nm modules to track and now you have Nf functions to track, with Nf >> Nm. Aggregating functions in library/modules to version is actually helping with versioning effort, not hindering it. More generally, the versioning of multi-component systems are complex affairs that can only be addressed by constraints - general engineering systems have standards + catalogs as the means of addressing this general engineering issue.
- Broadly I disagree with conflating modules and libraries. They are distinct conceptually. Modules could have state (and meta-data state), conceptually. Modules potentially could also have active elements internally. Modules can have life-cycles. To sum: modules conceputally are not just collections of (related) functions.
So the general question is 'can we live with just libraries of functions?'
I think PLT excitement here is not 'a k/v bag of richly annotated functions' -- the guaranteed end result of that approach is n variants of elaborate 'structure' encoded into the metadata Joe is talking about -- but rather pushing modules to extreme to make the distinction from libraries crystal clear.
You could go the route of having a zoo of many small libraries and just version the library, so mod-x becomes lib_x. That, dependency management, is not a convincing argument for something 'new' called "module". You can do it with libraries as well.
The question (then) remains: are modules really just libraries? Was it always just about coexistence of related functions?
Genuinely curious, thinking outside the box (writing a new programming language on top of Prolog?), and he treated people with utmost respect. I was a nobody lucky enough to escort him around Chicago one day when he was attending a conference, and we spent a couple of hours talking about art, Erlang, Riak, and man I wish I could remember what else.
Really? And every developer will create custom, broken, non-standard "namespace" system: admin_get_user vs. readers_get_user, or maybe admin.get_user vs. readers.get_user, or maybe get_user_admin, get_user_readers, etc. Surely, this will stir a lot of creativity, but I am not sure we need that.
https://news.ycombinator.com/item?id=8572600 - Nov 7, 2014 (76 comments)
https://news.ycombinator.com/item?id=10409507 - Oct 18, 2015 (46 comments)
https://news.ycombinator.com/item?id=20808000 - Aug 27, 2019 (103 comments)
EDIT: I also highly recommend reading the original mailing list thread. A lot of interesting discussion there that I don't recall having read before.
Purely functional on the other hand has none of these problems. This is the approach taken by the Unison Language people, which I think makes the right design decisions.
> Each Unison definition is identified by a hash of its syntax tree. Put another way, Unison code is content-addressed.
In either language class, you need to manage transactions, usually implicitly, if you want to do any meaningful work. This is a gap that either language class can easily solve and in many cases, there are working implementations of these ideas that do just that.
What problem are you actually trying to solve?
I think the idea is that if a function always gives the same result with the same inputs, it is considered “pure enough.”
Truly insane.
Point being, some ways that programming languages use the concept of a module are deep and let you do things you couldn’t otherwise do, some are more dispensable.
Then there’s Smalltalk where I believe programs aren’t collections of text files, you actually browse your code in a sort of code browser and it’s stored as runtime objects in a virtual machine image, basically!
Then there’s the matter of how code is released and distributed… packaging and libraries. (In practice, the words “module” and “package” are used in overlapping ways by different languages.) Are the single-function modules/packages of NPM a good thing? In practice it doesn’t seem that way.
It’s sort of analogous to trying to “giggify” all the jobs. Every individual function outsourced.
For programming languages, even with a flat module namespace, you get a land rush where good names get taken early by packages that might end up unpopular or abandoned.
Leveraging DNS seems like the answer. Java did it badly, but Go's approach seems fine, perhaps because it leverages GitHub's namespace too (for most modules).
O the contrary, Go's approach will lead to lots of problems while Java's is actually simpler and safer. The fact that you depend on the current status of DNS every time you build your code, for each and every one of your dependencies, including transitive ones, is completely nuts. Even Google realized this, and they solved it Google style: they added another automated system on top to try to add some stability (the Go proxy). And then they had to add holes to that system, because it turns out not all dependencies are public and so they can't solve this from on high (GOPRIVATE).
And still, if one of your dependencies decides to switch hosting provider for their source code, or loses their domain name, you have to make (small) changes in every code file that referenced that dependency.
Maven's solution is much simpler for everyone involved: DNS is only involved in registering a new module, it only serves as a form of authentication. After the initial registration, the module name is allocated to your Maven Central account, and it won't be revoked if you later lose that domain. If someone gets access to your domain, they don't also automatically become able to push malware to people who used your module for years, neither retroactively (which Go also handles) nor when they next upgrade (which Go will happily allow).
Maven and Gradle are part of the reason I don't use Java anymore. Java seems to have gone through multiple unfortunate build systems without settling on a good one.
/Francis Bacon
/Francis Bacon (artist)
without having to answer anguish-inducing questions that would be raised by
/Francis Bacon
/artist/Francis Bacon
e.g. questions such as:
1. "Maybe we should put Lord Bacon under a category also, maybe `philosopher/Francis Bacon`"
2. "But he's also a statesman ... what about `statesman/Francis Bacon`? Is he more of a philosopher or a statesman"?
3. What about ordinary people (who happen to be involved in historical events) that have wikipedia entries, like George Floyd? Should he be assigned something like `person/George Floyd`? If so, should the two Francis Bacons be assigned `person/philosopher/Francis Bacon` and `person/artist/Francis Bacon` instead?
And so on
In programming, it’s useful to have namespaces for overall organization and conflict resolution. But it’s a trade off: you are nudged into a hierarchical ontology, with all the implied issues.
Wikipedia titles are free text (more or less). So they can afford not to introduce hierarchical naming and still have nice, easily addressable names without conflicts.
This avoids all sorts of problems. Most things simply can’t be categorized in a strict, hierarchical manner.
I realise this is somewhat tangential to your point, but shows how easy it is for innocent looking choices to end up creating annoyances.
More directly in terms of Joe's brainstorming:
- I don't see how the versioning matter is simplified by a flat space of functions. Before you had Nm modules to track and now you have Nf functions to track, with Nf >> Nm. Aggregating functions in library/modules to version is actually helping with versioning effort, not hindering it. More generally, the versioning of multi-component systems are complex affairs that can only be addressed by constraints - general engineering systems have standards + catalogs as the means of addressing this general engineering issue.
- Broadly I disagree with conflating modules and libraries. They are distinct conceptually. Modules could have state (and meta-data state), conceptually. Modules potentially could also have active elements internally. Modules can have life-cycles. To sum: modules conceputally are not just collections of (related) functions.
So the general question is 'can we live with just libraries of functions?'
I think PLT excitement here is not 'a k/v bag of richly annotated functions' -- the guaranteed end result of that approach is n variants of elaborate 'structure' encoded into the metadata Joe is talking about -- but rather pushing modules to extreme to make the distinction from libraries crystal clear.
The question (then) remains: are modules really just libraries? Was it always just about coexistence of related functions?