Why do we need modules at all? (2011)

macintux · 2 years ago

Joe was such a creative thinker, and he was never (that I could tell) embarrassed by the prospect of an idea of his proving to be a poor one, so it was always interesting to hear him talk.

Genuinely curious, thinking outside the box (writing a new programming language on top of Prolog?), and he treated people with utmost respect. I was a nobody lucky enough to escort him around Chicago one day when he was attending a conference, and we spent a couple of hours talking about art, Erlang, Riak, and man I wish I could remember what else.

piokoch · 2 years ago

"all functions have unique distinct names"

Really? And every developer will create custom, broken, non-standard "namespace" system: admin_get_user vs. readers_get_user, or maybe admin.get_user vs. readers.get_user, or maybe get_user_admin, get_user_readers, etc. Surely, this will stir a lot of creativity, but I am not sure we need that.

Jtsummers · 2 years ago

Three past submissions with comments:

https://news.ycombinator.com/item?id=8572600 - Nov 7, 2014 (76 comments)

https://news.ycombinator.com/item?id=10409507 - Oct 18, 2015 (46 comments)

https://news.ycombinator.com/item?id=20808000 - Aug 27, 2019 (103 comments)

EDIT: I also highly recommend reading the original mailing list thread. A lot of interesting discussion there that I don't recall having read before.

calebh · 2 years ago

I've thought a lot about storing program structures in distributed hash tables and came to the conclusion that the only viable languages that can be safely stored are purely functional. If you consider OOP languages, there are many stateful dependencies, for example a method may rely on a constructor initializing a private variable in some specific way, so the smallest unit of modularity cannot be a method. Similarly an entire class could perhaps rely on methods being called in a certain order. Even though classes are designed to be self contained, stateful behavior really mucks up the ability to separate things into constituent parts.

Purely functional on the other hand has none of these problems. This is the approach taken by the Unison Language people, which I think makes the right design decisions.

carapace · 2 years ago

Check out Unison lang: https://www.unison-lang.org/learn/the-big-idea/

> Each Unison definition is identified by a hash of its syntax tree. Put another way, Unison code is content-addressed.

akira2501 · 2 years ago

It's stateful without transactions in the simplest implementation, but there's no reason you couldn't create a procedural language that maintains state and enforces transactional separation.

In either language class, you need to manage transactions, usually implicitly, if you want to do any meaningful work. This is a gap that either language class can easily solve and in many cases, there are working implementations of these ideas that do just that.

CyberDildonics · 2 years ago

I've thought a lot about storing program structures in distributed hash tables

What problem are you actually trying to solve?

TylerE · 2 years ago

Purely functional introduces lots of issues on it's own. (Look at all the monad insanity Haskell has to do to get the equivalent of a print statement).

docandrew · 2 years ago

I thought the D language had a really nice take on this with its notion of “weak” purity, allowing for straightforward allocations and I/O: https://klickverbot.at/blog/2012/05/purity-in-d/

I think the idea is that if a function always gives the same result with the same inputs, it is considered “pure enough.”

coldtea · 2 years ago

Yes. So insane /s

  myMethod :: IO ()
  myMethod = putStrLn "Hello World!"

tikhonj · 2 years ago

It's no more insane than async programming. In fact, it's less insane: it's what you'd get if you let async and await be first-class citizens in your language instead of awkward special cases.

mjan22640 · 2 years ago

1. You dont need a monad to do io in Haskell, you can do that directly. 2. Monad is a tool that offers a "grip" over io, ie your code still yields easily to formal proofs despite using io for example. 3. The sanity and insanity of controlled and uncontrolled io is imho the exact oposite of what you seem to imply.

the-smug-one · 2 years ago

What's insane about monads :-)?

aranchelk · 2 years ago

main = putStrLn "foo"

Truly insane.

mrkeen · 2 years ago

I looked.

    main = putStrLn "Hello, world"

taberiand · 2 years ago

I suppose it's only natural, having solved distributed systems with Erlang, Joe would move on to tackling the hardest problem in computer science - Naming Things.

dgreensp · 2 years ago

There’s a spectrum of what different programming languages call “modules” and what they use them for, with languages like ML having “true” modules in the conceptual sense, where a module is a unit of abstraction, the way that classes and interfaces in many languages are units of abstraction, though when you dig into it, I believe modules are more fundamental and more powerful… then on the other hand you have the situation of, “Programs are collections of text files, people don’t want to put all their code in one file, let’s call a file a module.”

Point being, some ways that programming languages use the concept of a module are deep and let you do things you couldn’t otherwise do, some are more dispensable.

Then there’s Smalltalk where I believe programs aren’t collections of text files, you actually browse your code in a sort of code browser and it’s stored as runtime objects in a virtual machine image, basically!

Then there’s the matter of how code is released and distributed… packaging and libraries. (In practice, the words “module” and “package” are used in overlapping ways by different languages.) Are the single-function modules/packages of NPM a good thing? In practice it doesn’t seem that way.

It’s sort of analogous to trying to “giggify” all the jobs. Every individual function outsourced.

skybrian · 2 years ago

Wikipedia is probably the most prominent example of a flat namespace that works at scale. But those names are pretty long, particularly when they need to disambiguate. Also, it's an encyclopedia where article editors are forced to collaborate, and it's leveraging large vocabularies that already exist. (One for each language.)

For programming languages, even with a flat module namespace, you get a land rush where good names get taken early by packages that might end up unpopular or abandoned.

Leveraging DNS seems like the answer. Java did it badly, but Go's approach seems fine, perhaps because it leverages GitHub's namespace too (for most modules).

tsimionescu · 2 years ago

> Leveraging DNS seems like the answer. Java did it badly, but Go's approach seems fine, perhaps because it leverages GitHub's namespace too (for most modules).

O the contrary, Go's approach will lead to lots of problems while Java's is actually simpler and safer. The fact that you depend on the current status of DNS every time you build your code, for each and every one of your dependencies, including transitive ones, is completely nuts. Even Google realized this, and they solved it Google style: they added another automated system on top to try to add some stability (the Go proxy). And then they had to add holes to that system, because it turns out not all dependencies are public and so they can't solve this from on high (GOPRIVATE).

And still, if one of your dependencies decides to switch hosting provider for their source code, or loses their domain name, you have to make (small) changes in every code file that referenced that dependency.

Maven's solution is much simpler for everyone involved: DNS is only involved in registering a new module, it only serves as a form of authentication. After the initial registration, the module name is allocated to your Maven Central account, and it won't be revoked if you later lose that domain. If someone gets access to your domain, they don't also automatically become able to push malware to people who used your module for years, neither retroactively (which Go also handles) nor when they next upgrade (which Go will happily allow).

MrBuddyCasino · 2 years ago

Been trying to make this point several times now, they always doubt that this problem has been solved, and by Maven of all places. The NIH syndrome is for some reason rampant among package managers/registries.

skybrian · 2 years ago

I'm not sure Go works that way. Are you confusing 'go get' (downloading code) with compiling code?

Maven and Gradle are part of the reason I don't use Java anymore. Java seems to have gone through multiple unfortunate build systems without settling on a good one.

PaulDavisThe1st · 2 years ago

What's the difference between

   /Wheel of Time
   /TV Series/Wheel of Time
   /Video Game/Wheel of Time

and

  /Wheel of Time
  /Wheel of Time (TV Series)
  /Wheel of Time (Video Game)

aragonite · 2 years ago

I'd argue that "mental anguish" is probably less of a problem on the latter design, since the additional parenthetical material is used pure for disambiguation (and so is optional), not for categorization. So, on the latter design we can have:

/Francis Bacon

/Francis Bacon (artist)

without having to answer anguish-inducing questions that would be raised by

/Francis Bacon

/artist/Francis Bacon

e.g. questions such as:

1. "Maybe we should put Lord Bacon under a category also, maybe `philosopher/Francis Bacon`"

2. "But he's also a statesman ... what about `statesman/Francis Bacon`? Is he more of a philosopher or a statesman"?

3. What about ordinary people (who happen to be involved in historical events) that have wikipedia entries, like George Floyd? Should he be assigned something like `person/George Floyd`? If so, should the two Francis Bacons be assigned `person/philosopher/Francis Bacon` and `person/artist/Francis Bacon` instead?

And so on

dgb23 · 2 years ago

I like the second one better.

In programming, it’s useful to have namespaces for overall organization and conflict resolution. But it’s a trade off: you are nudged into a hierarchical ontology, with all the implied issues.

Wikipedia titles are free text (more or less). So they can afford not to introduce hierarchical naming and still have nice, easily addressable names without conflicts.

This avoids all sorts of problems. Most things simply can’t be categorized in a strict, hierarchical manner.

Terr_ · 2 years ago

Yeah, the non-flat hierarchy/tagging may be informal, but it still exists. (And for good reason.)

ttctciyf · 2 years ago

Practically speaking, when you want to link to the latter in a markdown that already utilises parens, you encounter more friction - cutting and pasting a url with parens from the address bar needs manual correction if used to create a link in (say) reddit's markdown syntax.

I realise this is somewhat tangential to your point, but shows how easy it is for innocent looking choices to end up creating annoyances.

37638894breeze · 2 years ago

I have to listen to the Joe's talk on this but from OP this is more accurately "Why do we need Erlang modules at all?" or generously "Modules in FP languages". A key motivating pattern (fib/3) is a pattern in FP.

More directly in terms of Joe's brainstorming:

- I don't see how the versioning matter is simplified by a flat space of functions. Before you had Nm modules to track and now you have Nf functions to track, with Nf >> Nm. Aggregating functions in library/modules to version is actually helping with versioning effort, not hindering it. More generally, the versioning of multi-component systems are complex affairs that can only be addressed by constraints - general engineering systems have standards + catalogs as the means of addressing this general engineering issue.

- Broadly I disagree with conflating modules and libraries. They are distinct conceptually. Modules could have state (and meta-data state), conceptually. Modules potentially could also have active elements internally. Modules can have life-cycles. To sum: modules conceputally are not just collections of (related) functions.

So the general question is 'can we live with just libraries of functions?'

I think PLT excitement here is not 'a k/v bag of richly annotated functions' -- the guaranteed end result of that approach is n variants of elaborate 'structure' encoded into the metadata Joe is talking about -- but rather pushing modules to extreme to make the distinction from libraries crystal clear.

jimbokun · 2 years ago

Could you version based on some metadata tags, instead of modules?

37638894breeze · 2 years ago

You could go the route of having a zoo of many small libraries and just version the library, so mod-x becomes lib_x. That, dependency management, is not a convincing argument for something 'new' called "module". You can do it with libraries as well.

The question (then) remains: are modules really just libraries? Was it always just about coexistence of related functions?

coldtea · 2 years ago

Still a mess for discoverability, does not play well with regular code completion, and so on. So except if you also make a Smalltalk like IDE...

chriseidhof · 2 years ago

Other people know a lot more about this, but IIRC Unison has some ideas that go in this direction: https://www.unison-lang.org

whalesalad · 2 years ago

This language was on the tip of my tongue, thanks for sharing. I thought it was relevant too. Specifically: https://www.unison-lang.org/learn/the-big-idea/

    > Each Unison definition is identified by a hash of its syntax tree.
    > Put another way, Unison code is content-addressed.