We need modules so that my search results aren't cluttered with contamination from code that is optimised to be found rather than designed to solve my specific problem.
We need then so that we can find all functions that are core to a given purpose, and have been written with consideration of their performance and a unified purpose rather than also finding a grab bag of everybody's crappy utilities that weren't designed to scale for my use case.
We need them so that people don't have to have 80 character long function names prefixed with Hungarian notation for every distinct domain that shares the same words with different meanings.
I agree, but also agree with the author's statement "It's very difficult to decide which module to put an individual function in".
Quite often coders optimise for searchability, so like there will be a constants file, a dataclasses file, a "reader"s file, a "writer"s file etc etc. This is great if you are trying to hunt down a single module or line of code quickly. But it can become absolute misery to actually read the 'flow' of the codebase, because every file has a million dependencies, and the logic jumps in and out of each file for a few lines at a time. I'm a big fan of the "proximity principle" [1] for this reason - don't divide code to optimise 'searchability', put things together that actually depend on each other, as they will also need to be read / modified together.
> It's very difficult to decide which module to put an individual function in
It's difficult because it is a core part of software engineering; part of the fundamental value that software developers are being paid for. Just like a major part of a journalist's job is to first understand a story and then lay it out clearly in text for their readers, a major part of a software developer's job is to first understand their domain and then organize it clearly in code for other software developers (including themselves). So the act of deciding which modules different functions go in is the act of software development. Therefore, these people:
> Quite often coders optimise for searchability, so like there will be a constants file, a dataclasses file, a "reader"s file, a "writer"s file etc etc.
Those people are shirking their duty. I disdain those people. Some of us software developers actually take our jobs seriously.
One thing I experimented with was writing a tag-based filesystem for that sort of thing. Imagine, e.g., using an entity component system and being able to choose a view that does a refactor across all entities or one that hones in on some cohesive slice of functionality.
In practice, it wound up not quite being worth it (the concept requires the same file to "exist" in multiple locations for that idea to work with all your other tools in a way that actually exploits tags, but then when you reference a given file (e.g., to import it) that needs to be some sort of canonical name in the TFS so that on `cd`-esque operations you can reference the "right" one -- doable, but not agnostic of the file format, which is the point where I saw this causing more problems than it was solving).
I still think there's something there though, especially if the editing environment, programming language, and/or representation of the programming language could be brought on board (e.g., for any concrete language with a good LSP, you can re-write important statements dynamically).
Not to pick on Rails, sorting files into "models / views / controllers" seems to be our first instinct. My pantry is organized that way: baking stuff goes here, oils go there, etc.
A directory hierarchy feels more pleasant when it maps to features, instead. Less clutter.
Most programmers do not care about OO design, but "connascence" has some persuasive arguments.
> Knowing the various kinds of connascence gives us a metric for determining the characteristics and severity of the coupling in our systems. The idea is simple: The more remote the connection between two clusters of code, the weaker the connascence between them should be.
> Good design principles encourages us to move from tight coupling to looser coupling where possible. But connascence allows us to be much more specific about what kinds of problems we’re dealing with, which makes it easier to reason about the types of refactorings that can be used to weaken the connascence between components.
We could get that without a hierarchical categorization of code, though?
Makes me wonder what it would look like if you gave "topics" to code as you wrote it. Where would you put some topics? And how many would you have that are part of several topics?
There is a similar question about message board systems.
Instead of posting a topic in a subforum, what if subforums were turned into tags and you just post your topic globally with those tags. Now you can have a unified UI that shows all topics, and people can filter by tag.
I experimented with this with a /topics page that implemented such a UI. What I found was that it becomes one big soup that lacks the visceral structure that I quickly found to be valuable once it was missing.
There is some value to "Okay, I clicked into the WebDesign subforum and I know the norms here and the people who regularly post here. If I post a topic, I know who is likely to reply. I've learned the kind of topics that people like to discuss here which is a little different than this other microclimate in the RubyOnRails subforum. I know the topics that already exist in this subforum and I have a feel for it because it's separate from the top-level firehose of discussion."
I think something similar happens with modules and grouping like-things into the same file. Microclimates and micronorms emerge that are often useful for wrapping your brain around a subsystem, contributing to it, and extending it. Even if the norms and character change between files and modules, it's useful that there are norms and character when it comes to understanding what the local objective is and how it's trying to solve it.
Like a subforum, you also get to break down the project management side of things into manageable chunks without everything always existing at a top organizational level.
I feel like you are arguing more for namespaces than modules.
Having a hierarchical naming system that spans everything makes it largely irrelevant how the functions themselves are physically organized. This also provides a pattern for disambiguating similar products by way of prefixing the real world FQDNs of each enterprise.
As another poster already said, providing namespaces is just one of the functions of modules, the other being encapsulation, i.e. the interface of a module typically exports only a small subset of the internal symbols, the rest being protected from external accesses.
While a function may have local variables that are protected from external accesses, a module can export not only multiple functions, but any other kinds of symbols, e.g. data types or templates, while also being able to keep private any kind of symbol.
In languages like C, which have separate compilation, but without modules, you can partition code in files, then choose for each symbol whether to be public or not, but with modules you can handle groups of related symbols simultaneously, in a simpler way, which also documents the structure of the program.
Moreover, with a well-implemented module system, compilation can be much faster than when using inefficient tricks for specifying the interfaces, like header file textual inclusion.
It is irrelevant until you have 4gb of binaries loaded from 50 repositories and then you are trying to find the definition of some cursed function that isn't defined in the same spot as everything it is related to, and now you have to download/search through all 50 repositories because any one of them could have it. (True story)
The article references the true granularity issue (actually the function names need a version number as well, not sure in my scan of the article if it was mentioned).
Modules being collections of types and functions obviously increases coarseness. I'm not a fan of most import mechanisms because it leaves versioning and namespace versioning (if it has namespaces at all...) out, to be picked up poorly by build systems and dependency graph resolvers and that crap.
How do you imagine importing modules by version in the code? Something like "import requests version 2.0.3"? This sounds awful when you accidentally import the same module in two different versions and chaos ensures.
I miss Joe, he left us too early. He always had wild ideas like that. For a while he had this idea of a git + bittorrent he called it gittorrent, only to find out someone had already used the name. I think it was a bit of an extension of this universal functions idea.
If you expand some of the comments below, he and other members of the community at the time have a nice discussion about hierarchical namespace.
I particularly like his "flat beer and chips" comment:
> I'd like to know if there will be hierarchial modules in Erlang,
because tree of packages is a rather good idea:
No it's not - this has been the subject of long and heated discussion and is
why packages are NOT in Erlang - many people - myself included - dislike
the idea of hierarchical namespaces. The dot in the name has no semantics
it's just a separator. The name could equally well be encoders.mpg.erlyvideo
or mpg.applications.erlvideo.encoder - there is no logical way to organise the
package name and it does not scale -
erlyvideo.mpegts.encoder
erlyvideo.rtp.encoder
But plain module namespace is also ok. It would be impossible for me
to work with 30K LOC with plain function namespace.
The English language has a flat namespace.
I'd like a drink.alcoholic.beer with my food.unhealthy.hamburger and my food.unhealthy.national.french.fries
Software development is continually emotionally stunted by a lack of people with expertise in multiple other fields.
English absolutely has namespaces. Every in-group has shibboleths and/or jargon, words that mark membership in the group that have connotations beyond the many dictionary definitions of that word (in fact I wonder how many words with more than three definitions started out as jargon/slang words that achieved general acceptance).
You cannot correctly parse a sentence without the context in which it was written. It’s a literary device some authors use. By letting the reader assume one interpretation of a prophetic sentence early on, the surprise the reader experiences when they discover a different interpretation at the end intensifies the effect.
> Software development is continually emotionally stunted by a lack of people with expertise in multiple other fields.
I think Joe's point is about the perennial discussion whether hierarchy is better than tags. It's as old as software or as old as people started categorizing things. Some early databases were hierarchical KV stores. Email clients and services go through that too, is it better to group messages by tags or have a single hierarchy of folders?
> English absolutely has namespaces
Sure, we can pick apart the analogy, after all we're not programing in English unless we write LLM prompts (or COBOL /s). Then if English has namespaces what would you pick lager.flat.alcoholic or alcoholic.lager.flat or lager.alcoholic.flat, etc? Is there a top-level "lager" vs "ale" package, with a flat vs carbonated as next level?
yes. and math/logics trained brains confuse hierarchical namespaces with trees. names in nested namespaces should be DAGs, maybe even arbitrary graphs, meaning a chair can be a sit-on thing in several contexts, but not in all (think meeting).
in many contemporary programming languages you can express this, too, by exporting some imported name.
It's arguable that any group's dialect is actually a fork of English specialized for a specific culture, activity, or context. Occasionally, elements of the fork are pulled into upstream English as groups grow in popularity and jargon or shibboleths become more commonly used across dialects.
And privacy in Rust is load-bearing for encapsulating unsafe operations from safe code, so it's not just a nice-to-have, its fundamental to the language.
but we do have alcoholic beer, and non-alcoholic beer, and it is nice to be able to say which one you want. And yes, there is a separator here, too, it is called a space.
This is exactly what Unison (https://www.unison-lang.org/) does. It’s kinda neat. Renaming identifiers is free. Uh… probably something else is neat (I haven’t used Unison irl)
A lot of things are neat because of this. Refactoring becomes trivial and safe. If you do not change the type of the refactored function, you can safely do a batch replace and everywhere the old function was used, the new one will be used after that. If you do change the type, the compiler interface will guide you through an interactive flow where you have to handle the change everywhere the function was being used. You can stop in the middle and continue later... and once you're done you just commit and push... all the while the code continues to work. Even cooler, perhaps: no unit test is re-run if not affected. And given the compiler knows the full AST of everything , it knows exactly when a test must run again.
I tried it out. Fascinating language and a completely different paradigm. The language itself is familiar, but the structure of the program is different - no files – all functions are in a database and their history. I found the language a bit difficult to navigate, but that is probably because of my experience of work with files, and having tools based on files.
This is one of those things where I don’t agree with the argument, but know the person making it knows way more than I do on the subject and has given it way more thought. In these cases it’s usually best to sit back and listen a bit...
I think Hoogle[1] is proof this concept could work. Haskell has modules, of course, but even if it didn't, Hoogle would keep it still pretty usuable.
The import piece here which is mentioned but not very emphasized in TFA is that Hoogle lets you search by meta data instead of just by name. If a function takes the type I have, and transforms it to the type I want, and the docs say it does what I want, I don't really care what module or package it's from. In fact, that's often how I use Hoogle, finding the function I need across all Stack packages.
That said, while I think it could work, I'm not convinced it'd have any benefit over the statys quo in practice.
As with other similar proposals, doesn't this simply move the complexity around without changing anything else? Now instead of looking for the right module or whatnot, you'll be sifting through billions of function definitions, trying to find the very specific one that does what you need, buried between countless almost but not quite similar functions.
We need then so that we can find all functions that are core to a given purpose, and have been written with consideration of their performance and a unified purpose rather than also finding a grab bag of everybody's crappy utilities that weren't designed to scale for my use case.
We need them so that people don't have to have 80 character long function names prefixed with Hungarian notation for every distinct domain that shares the same words with different meanings.
Quite often coders optimise for searchability, so like there will be a constants file, a dataclasses file, a "reader"s file, a "writer"s file etc etc. This is great if you are trying to hunt down a single module or line of code quickly. But it can become absolute misery to actually read the 'flow' of the codebase, because every file has a million dependencies, and the logic jumps in and out of each file for a few lines at a time. I'm a big fan of the "proximity principle" [1] for this reason - don't divide code to optimise 'searchability', put things together that actually depend on each other, as they will also need to be read / modified together.
[1] https://kula.blog/posts/proximity_principle/
It's difficult because it is a core part of software engineering; part of the fundamental value that software developers are being paid for. Just like a major part of a journalist's job is to first understand a story and then lay it out clearly in text for their readers, a major part of a software developer's job is to first understand their domain and then organize it clearly in code for other software developers (including themselves). So the act of deciding which modules different functions go in is the act of software development. Therefore, these people:
> Quite often coders optimise for searchability, so like there will be a constants file, a dataclasses file, a "reader"s file, a "writer"s file etc etc.
Those people are shirking their duty. I disdain those people. Some of us software developers actually take our jobs seriously.
In practice, it wound up not quite being worth it (the concept requires the same file to "exist" in multiple locations for that idea to work with all your other tools in a way that actually exploits tags, but then when you reference a given file (e.g., to import it) that needs to be some sort of canonical name in the TFS so that on `cd`-esque operations you can reference the "right" one -- doable, but not agnostic of the file format, which is the point where I saw this causing more problems than it was solving).
I still think there's something there though, especially if the editing environment, programming language, and/or representation of the programming language could be brought on board (e.g., for any concrete language with a good LSP, you can re-write important statements dynamically).
[1] https://en.wikipedia.org/wiki/Cohesion_(computer_science)
A directory hierarchy feels more pleasant when it maps to features, instead. Less clutter.
Most programmers do not care about OO design, but "connascence" has some persuasive arguments.
https://randycoulman.com/blog/2013/08/27/connascence/
https://practicingruby.com/articles/connascence
https://connascence.io/
> Knowing the various kinds of connascence gives us a metric for determining the characteristics and severity of the coupling in our systems. The idea is simple: The more remote the connection between two clusters of code, the weaker the connascence between them should be.
> Good design principles encourages us to move from tight coupling to looser coupling where possible. But connascence allows us to be much more specific about what kinds of problems we’re dealing with, which makes it easier to reason about the types of refactorings that can be used to weaken the connascence between components.
Makes me wonder what it would look like if you gave "topics" to code as you wrote it. Where would you put some topics? And how many would you have that are part of several topics?
Instead of posting a topic in a subforum, what if subforums were turned into tags and you just post your topic globally with those tags. Now you can have a unified UI that shows all topics, and people can filter by tag.
I experimented with this with a /topics page that implemented such a UI. What I found was that it becomes one big soup that lacks the visceral structure that I quickly found to be valuable once it was missing.
There is some value to "Okay, I clicked into the WebDesign subforum and I know the norms here and the people who regularly post here. If I post a topic, I know who is likely to reply. I've learned the kind of topics that people like to discuss here which is a little different than this other microclimate in the RubyOnRails subforum. I know the topics that already exist in this subforum and I have a feel for it because it's separate from the top-level firehose of discussion."
I think something similar happens with modules and grouping like-things into the same file. Microclimates and micronorms emerge that are often useful for wrapping your brain around a subsystem, contributing to it, and extending it. Even if the norms and character change between files and modules, it's useful that there are norms and character when it comes to understanding what the local objective is and how it's trying to solve it.
Like a subforum, you also get to break down the project management side of things into manageable chunks without everything always existing at a top organizational level.
Having a hierarchical naming system that spans everything makes it largely irrelevant how the functions themselves are physically organized. This also provides a pattern for disambiguating similar products by way of prefixing the real world FQDNs of each enterprise.
While a function may have local variables that are protected from external accesses, a module can export not only multiple functions, but any other kinds of symbols, e.g. data types or templates, while also being able to keep private any kind of symbol.
In languages like C, which have separate compilation, but without modules, you can partition code in files, then choose for each symbol whether to be public or not, but with modules you can handle groups of related symbols simultaneously, in a simpler way, which also documents the structure of the program.
Moreover, with a well-implemented module system, compilation can be much faster than when using inefficient tricks for specifying the interfaces, like header file textual inclusion.
Modules being collections of types and functions obviously increases coarseness. I'm not a fan of most import mechanisms because it leaves versioning and namespace versioning (if it has namespaces at all...) out, to be picked up poorly by build systems and dependency graph resolvers and that crap.
If you expand some of the comments below, he and other members of the community at the time have a nice discussion about hierarchical namespace.
I particularly like his "flat beer and chips" comment:
https://groups.google.com/g/erlang-programming/c/LKLesmrss2k
---
> I'd like to know if there will be hierarchial modules in Erlang, because tree of packages is a rather good idea:
No it's not - this has been the subject of long and heated discussion and is why packages are NOT in Erlang - many people - myself included - dislike the idea of hierarchical namespaces. The dot in the name has no semantics it's just a separator. The name could equally well be encoders.mpg.erlyvideo or mpg.applications.erlvideo.encoder - there is no logical way to organise the package name and it does not scale -
erlyvideo.mpegts.encoder erlyvideo.rtp.encoder
But plain module namespace is also ok. It would be impossible for me to work with 30K LOC with plain function namespace.
The English language has a flat namespace.
I'd like a drink.alcoholic.beer with my food.unhealthy.hamburger and my food.unhealthy.national.french.fries
I have no problem with flat beer and chips.
/Joe
---
English absolutely has namespaces. Every in-group has shibboleths and/or jargon, words that mark membership in the group that have connotations beyond the many dictionary definitions of that word (in fact I wonder how many words with more than three definitions started out as jargon/slang words that achieved general acceptance).
You cannot correctly parse a sentence without the context in which it was written. It’s a literary device some authors use. By letting the reader assume one interpretation of a prophetic sentence early on, the surprise the reader experiences when they discover a different interpretation at the end intensifies the effect.
I think Joe's point is about the perennial discussion whether hierarchy is better than tags. It's as old as software or as old as people started categorizing things. Some early databases were hierarchical KV stores. Email clients and services go through that too, is it better to group messages by tags or have a single hierarchy of folders?
> English absolutely has namespaces
Sure, we can pick apart the analogy, after all we're not programing in English unless we write LLM prompts (or COBOL /s). Then if English has namespaces what would you pick lager.flat.alcoholic or alcoholic.lager.flat or lager.alcoholic.flat, etc? Is there a top-level "lager" vs "ale" package, with a flat vs carbonated as next level?
> As of 2013, this 3-letter verb common in sports, theater & politics has the largest entry in the online OED.
The correct response? What is "run"?
in many contemporary programming languages you can express this, too, by exporting some imported name.
That's not true of all module systems. It's true in Java, but not in Rust, where it establishes a parent-child relationship, and in which context [1]:
> If an item is private, it may be accessed by the current module and its descendants.
[1] https://doc.rust-lang.org/reference/visibility-and-privacy.h...
Deleted Comment
Deleted Comment
This is exactly what Unison (https://www.unison-lang.org/) does. It’s kinda neat. Renaming identifiers is free. Uh… probably something else is neat (I haven’t used Unison irl)
[0] https://www.unison-lang.org
Variables aren't named, they are beta reduced and referred to by abstraction level.
https://text.marvinborner.de/2023-04-06-01.html
The import piece here which is mentioned but not very emphasized in TFA is that Hoogle lets you search by meta data instead of just by name. If a function takes the type I have, and transforms it to the type I want, and the docs say it does what I want, I don't really care what module or package it's from. In fact, that's often how I use Hoogle, finding the function I need across all Stack packages.
That said, while I think it could work, I'm not convinced it'd have any benefit over the statys quo in practice.
[1]: https://hoogle.haskell.org/
They are nodes in a graph, where the other nodes are the input types, output types and other functions.
It makes sense to cluster closely associated notes, hence Modules.