Timeouts on calls are, as the OP mentions, a thing in Erlang. Inter-process and inter-computer calls in QNX can optionally time out, and this includes all system calls that can block. Real-time programs use such features. Probably don't want it on more than that. It's like having exceptions raised in things you thought worked.
- Capabilities
They've been tried at the hardware level, and IBM used them in the System/38, but they never caught on. They're not really compatible with C's flat memory model, which is partly they fell out of fashion.
Capabilities mean having multiple types of memory. Might come back if partially-shared multiprocessors make a comeback.
- Production-Level Releases
That's kind of vague. Semantic versioning is a related concept. It's more of a tooling thing than a language thing.
- Semi-Dynamic Language
I once proposed this for Python. The idea was that, at some point, the program made a call that told the system "Done initializing". After that point, you couldn't load more code, and some other things that inhibit optimization would be prohibited. At that point, the JIT compiler runs, once. No need for the horrors inside PyPy which deal with cleanup when someone patches one module from another.
Guido didn't like it.
- Value Database
The OP has a good criticism of why this is a bad idea. It's an old idea, mostly from LISP land, where early systems saved the whole LISP environment state.
Source control? What's that?
- A Truly Relational Language
Well, in Python, almost everything is a key/value store. The NoSQL people were going in that direction. Then people remembered that you want atomic transactions to keep the database from turning to junk, and mostly backed off from NoSQL where the data matters long-term.
- A Language To Encourage Modular Monoliths
Hm. Needs further development. Yes, we still have trouble putting parts together.
There's been real progress. Nobody has to keep rewriting Vol. I of Knuth algorithms in each new project any more. But what's being proposed here?
- Modular Linting
That's mostly a hack for when the original language design was botched.
View this from the point of the maintenance programmer - what guarantees apply to this code? What's been prevented from happening? Rust has one linter, and you can add directives in the code which allow exceptions. This allows future maintenance programmers to see what is being allowed.
I read it, and to me it seems like they're worried about the wrong things. As I understand it, they're worried about the difficulty and hassle of calling unrelated code in the monolith, and proposing things that would make it easier. But that's wrongheaded. Monoliths don't suffer because it's too hard to reuse functionality. They suffer because it's too easy. Programmers create connections and dependencies that shouldn't exist, and the monolith starts to strangle itself (if you have good tests) or shake itself to pieces (if you don't) because of unnecessary coupling. You need mechanisms that enforce modularity, that force programmers to reuse code at designated, designed module interfaces, not mechanisms that make it easier to call arbitrary code elsewhere in the monolith.
In my opinion, a great deal of the success of microservices is due to the physical impossibility of bypassing a network API and calling the code of another service directly. Because of this, programmers respect the importance of designing and evolving APIs in microservices. Essentially, microservices enforce modularity, forcing programmers to carefully design and evolve the API to their code, and this is such a powerful force for good design that it makes microservices appealing even when their architectural aspects aren't helpful.
A language that made it possible to enforce modularity in a monolith as effectively as it is enforced in microservices would make monoliths a no-brainer when you don't need the architectural aspects of microservices.
> Capabilities ... They're not really compatible with C's flat memory model ... Capabilities mean having multiple types of memory
C is not really dependent on a flat memory model - instead, it models memory allocations as separate "objects" (quite reniniscent of "object orientation in hardware" which is yet another name for capabilities), and a pointer to "object" A cannot be offset to point into some distinct "object" B.
> A Truly Relational Language
This is broadly speaking how PROLOG and other logic-programming languages work. The foundational operation in such languages is a knowledge-base query, and "relations" are the unifying concept as opposed to functions with predefined inputs and outputs.
(This is one of those times where the C memory model as described in the spec is very different from the mental PDP-11 that C programmers actually use to reason about)
> In general, while I can’t control how people react to this list, should this end up on, say, Hacker News, I’m looking more for replies of the form “that’s interesting and it makes me think of this other interesting idea” and less “that’s stupid and could never work because X, Y, and Z so everyone stop talking about new ideas” or “why hasn’t jerf heard of this other obscure language that tried that 30 years ago”. (Because, again, of course I don’t know everything that has been tried.)
- Everything except C now has standard strings, not just arrays of characters. Almost all languages now have some standard way to do key/value sets. What else ought to be standard?
-- Arrays of more than one dimension would be helpful for numerical work. Most languages descended from C lack this. They only have arrays of arrays. Even Rust lacks it. Proposals run into bikeshedding - some people want rectangular slices out of arrays, which means carrying stride info around.
-- Standard types for 2, 3 and 4-element vectors would help in graphics work. There are too many different implementations of those in most language and too much conversion.
Things to think about:
- Rust's ownership restrictions are harsh. Can we keep the safety and do more?
-- The back-reference problem needs to be solved somehow. Back references can be done with Rc and Weak, but it's clunky.
-- Can what Rust does with Rc, RefCell, and .borrow() be checked at compile time? That allows eliminating the run-time check, and provides assurance that the run-time check won't fail. Something has to look at the entire call tree at compile time, and sometimes it won't be possible to verify this at compile time. But most of the time, it should be.
-- There's a scheme for ownership where there's one owning reference and N using references. The idea is to verify at compile time that the using references cannot outlive the owning one. Then there's no need for reference counds.
-- Can this be extended to the multi-thread case? There have been academic demos of static deadlock detection, but that doesn't seem to have made it into production languages.
-- A common idiom involves things being owned by handles, but also indexed for lookup by various keys. Dropping the handle drops the object and removes it from the indices. Is that a useful general purpose operation? It's one that gets botched rather often.
-- Should compilers have SAT-solver level proof systems built in?
-- Do programs really have to be in monospaced fonts? (Mesa on the Alto used the Bravo word processor as its text editor. Nobody does that any more.)
-- There's async, there are threads, and there are "green threads", such as Go's "goroutines". Where's that going?
-- Can we have programs which run partly in a CPU and partly in a GPU, compiled together with the appropriate consistency checks, so the data structures and calls must match to compile?
-- How about "big objects?" These are separately built program components which have internal state and some protection from their callers. Microsoft OLE did that, some .dll files do that, and Intel used to have rings of protection and call gates to help with that, hardware features nobody used. But languages never directly supported such objects.
> Well, in Python, almost everything is a key/value store.
Why would that be anywhere near an adequate substitute? KV stores are not relational, they don't support relational algebra. KV stores in PLs are common as dirt, so if they were relevant to the question of ending relations in a language I think the author would have noticed.
He proposes that there is a need for a way to connect modules, i.e. dependency injection, without the modules having explicit knowledge of each other, with compile-time verification that the modules being connected are compatible, without the interface song and dance.
Many of these things (not only what you describe here but also the linked article) are stuff that I had intended to be available in the built-in command shell (called "Command, Automation, and Query Language", which is meant to describe some of the intentions) of an operating system design, so that they would have support from the operating system.
About capabilities, I think that capabilities should be a feature of the operating system, although hardware support would be helpful. However, I think that it could be done with tagged memory, without necessarily needing multiple types of memory, and programming languages such as C could still be capable of using them (although some things might not work as it would be expected on other computers, e.g. if you try to copy a reference to a capability into a memory area that is expected to be a number and then try to perform arithmetic on that number, the program is likely to crash even if the result is never dereferenced).
However, my idea also involves "proxy capabilities" too, so that you can effectively make up your own capabilities and other programs receive them without necessarily knowing where they came from (this allows supporting many things, including (but not limited to) many of the idea of "divergent desktop" of Arcan).
> The OP has a good criticism of why this is a bad idea. It's an old idea, mostly from LISP land, where early systems saved the whole LISP environment state. Source control? What's that?
Symbolics Genera can save (incremental and complete) images (-> "Worlds"). The image tracks all the sources loaded into it. The sources/files/docs/... of the software is stored on a central (or local) file server.
I can for example start an initial world and load it with all the wanted software in the various versions I want. Maybe I save a new world from that.
I can also start an pre-loaded world and incrementally update the software: write patches, create new minor/major versions, load patches and updates from the central server, install updates from distributions, ... Maybe save new worlds.
The "System Construction Tool" tracks what code is loaded in what version from where.
> The OP has a good criticism of why this is a bad idea.
They simply assert "twiddling a run-time variable for debugging in your staging environment can propagate straight into a bug on production".
As-if straight into production without re-testing.
> Source control? What's that?
"ENVY/Manager augments this model by providing configuration management and version control facilities. All code is stored in a central database rather than in files associated with a particular image. Developers are continuously connected to this database; therefore changes are immediately visible to all developers."
> Capabilities
>
> Capabilities mean having multiple types of memory. Might come back if partially-shared multiprocessors make a comeback.
I found this description amusing because all modern memory safe languages have capabilities, and they all have multiple types of memory: that's what an object is! Memory safety partitions memory into different types, and object references are capabilities!
What languages do next is where they break the security properties of capabilities: they add "ambient authority" and "rights amplification". Quick primer:
Ambient authority is basically the same problem as globally mutable state. Globally mutable state impedes modular reasoning about code, but if that state also carries authority to do something dangerous in the real world, like launch missiles, then it also impedes modular reasoning about security for the exact same reasons.
Rights amplification is the ability to turn a reference to an object with little to no authority, into a reference to an object with more authority. File.Open is the quintessential example, where you can turn an immutable string that conveys no authority, into a file handle to your root file system!
File.Open is also typically accessible to all code, meaning it's also ambient authority. This classical file API is completely bonkers from a security perspective.
So we already have capabilities, what we really need to do is to stop adding all of this insecurity! The developers of the E language actually showed that this could be done by making a capability secure subset of Java called Joe-E, which removed the parts of Java and the standard library that exposed ambient authority or rights amplification patterns. Most Java code could run unmodified.
And as for whether capability security will ever be adopted elsewhere, it already has been! WASM's WASI has capability security in its core design, because capability security is exactly what you need for good isolation and virtualization, which are the core properties WASM needs.
I think squeak had Monticello for source control with their image based approach almost
20+ years ago and there was something else for smalltalk in the '80s too.
But yeah people like text and hate images, and I believe Pharo switched back to some git integration.
"ENVY/Manager augments this model by providing configuration management and version control facilities. All code is stored in a central database rather than in files associated with a particular image. Developers are continuously connected to this database; therefore changes are immediately visible to all developers."
Smalltalk implementations have had text export/import for ages, and image based source control as you point out, is also quite old, Monticello wasn't the first.
I agree about relational languages. It's absurd when I think that SQL and Datalog came from the same foundations of relational calculus. It's just so much lost expressive power.
I really like what PRQL [1] did, at least it makes table operations easily chainable. Another one that comes to mind is Datomic [2].
I was struggling with doing interesting things with the semantic web circa 2007 and was thinking "OWL sucks" and looking at Datalog as an alternative. At that time Datalog was an obscure topic and hard to find information about it. 10 years later it was big.
(Funny after years of searching I found somebody who taught me how to do really complex modelling in OWL DL but from reading the literature I'm pretty sure the average PhD or prof in the field has no idea.)
I have spent a lot of time trying to understand how we ended up with SQL. Best I can determine, we got SQL because it isn't relational, it is tablational. Tables are a lot easier than relations to understand for the layman, and they successfully pushed for what they were comfortable with, even if to the chagrin of technical people.
"RM:
What was key to SQL becoming the standard language for relational databases in the mid- 1980s? Was all down to good marketing?
CJD:
In other words, why did SQL became so popular? Especially given all its faults? Well, I think this is rather a sorry story. I said earlier that there has never been a mainstream DBMS product that’s truly relational. So the obvious question is: Why not? And I think a good way for me to answer your questions here is to have a go at answering this latter question in their place, which I’ll do by means of a kind of Q&A dialog. Like this:
Q:
Why has no truly relational DBMS has ever been widely available in the marketplace?
A:
Because SQL gained a stranglehold very early on, and SQL isn’t relational.
Q:
Why does SQL have such a stranglehold?
A:
Because SQL is “the standard language for RDBMSs.”
Q:
Why did the standard endorse SQL as such and not something else-something better?
A:
Because IBM endorsed SQL originally, when it decided to build what became DB2. IBM used to be more of a force in the marketplace than it is today. One effect of that state of affairs was that-in what might be seen as a self-fulfilling prophecy-competitors (most especially Relational Software Inc., which later became Oracle Corp.) simply assumed that SQL was going to become a big deal in the marketplace, and so they jumped on the SQL bandwagon very early on, with the consequence that SQL became a kind of de facto standard anyway.
Q:
Why did DB2 support SQL?
A:
Because (a) IBM Research had running code for an SQL prototype called System R and (b) the people in IBM management who made the decision to use System R as a basis on which to build DB2 didn’t understand that there’s all the difference in the world between a running prototype and an industrial strength product. They also, in my opinion, didn’t understand software (they certainly didn’t understand programming languages). They thought they had a bird in the hand.
Q:
Why did the System R prototype support SQL?
A:
My memory might be deficient here, but it’s my recollection that the System R implementers were interested primarily in showing that a relational-or “relational”-DBMS could achieve reasonable performance (recall that “relational will never perform” was a widely held mantra at the time). They weren’t so interested in the form or quality of the user interface. In fact, some of them, at least, freely admitted that they weren’t language designers as such. I’m pretty sure they weren’t all totally committed to SQL specifically. (On the other hand, it’s true that at least one of the original SQL language designers was a key player in the System R team.)
Q:
Why didn’t “the true relational fan club” in IBM-Ted and yourself in particular-make more fuss about SQL’s deficiencies at the time, when the DB2 decision was made?
A:
We did make some fuss but not enough. The fact is, we were so relieved that IBM had finally agreed to build a relational-or would-be relational-product that we didn’t want to rock the boat too much. At the same time, I have to say too that we didn’t realize how truly awful SQL was or would turn out to be (note that it’s much worse now than it was then, though it was pretty bad right from the outset). But I’m afraid I have to agree, somewhat, with the criticism that’s implicit in the question; that is, I think I have to admit that the present mess is partly my fault."
For semi-dynamic language, Julia definitely took the approach of being a dynamic language that can be (and is) JITed to excellent machine code. I personally have some larger projects that do a lot of staged programming and even runtime compilation of user-provided logic using Julia. Obviously the JIT is slower to complete than running a bit of Lua or whatever, but the speed after that is phenomenal and there’s no overhead when you run the same code a second time. It’s pretty great and I’d love to see more of that ability in other languages!
Some of the other points resonate with me. I think sensible dynamic scoping would be an easy way to do dependency injection. Together with something like linear types you could do capabilities pretty smoothly, I think. No real reason why you couldn’t experiment with some persistent storage as one of these dependencies, either. Together with a good JIT story would make for a good, modular environment.
Oh and Zig is another option for allowing injections that are checked when used at a call site rather than predefined through interfaces.
AFAIK it doesn’t have closures (it’s too C-like) so you need to use methods for all your (implicit) interfaces, but that’s okay…
I think the “exemplars” could be automatically yoinked from documentation and tests and existing usage of the function in the code base. Work needs to be done on the IDE front to make this accessible to the user.
Julia is kind of Dylan's revenge, even if it doesn't take over the whole world, it is already great if it gets its own corner, and from the looks of it that is going alright.
I am also agreeing that relational approach to in-memory data is a good, efffective thought.
I recently compiled some of my C code with the sqlite database and I'm preparing to think how the SQL model of my standard code could be used as the actual implementation language of in memory operations.
Instead of writing the hundredth loop through objects I just write a SQL query instead with joining with seeing the internal data representation of the software as an information system instead of bespoke code.
I was hoping to make it possible to handle batches of data and add parallelism because arrays are useful when you want to parallelise.
I was thinking, wouldn't it be good if you could write your SQL queries in advance of the software and then parse them and then compile them to C code (using an unrolled loop of the SQLite VM) so they're performant. (For example, instead of a btree for a regular system operation, you can just use a materialised array a bit like a filesystem so you're not rejoining the same data all the time)
I was thinking of ways of representing actors somehow communicating by tables but I do not have anything concrete for that.
DataDraw is an ultra-fast persistent database for high performance programs written in C. It's so fast that many programs keep all their data in a DataDraw database, even while being manipulated in inner loops of compute intensive applications. Unlike slow SQL databases, DataDraw databases are compiled, and directly link into your C programs. DataDraw databases are resident in memory, making data manipulation even faster than if they were stored in native C data structures (really). Further, they can automatically support infinite undo/redo, greatly simplifying many applications.
For anyone happy enough to consider dealing with the JVM instead of C, and Clojure instead of SQL, I think this CINQ project can deliver on much of what you're looking for here: https://github.com/wotbrew/cinq
> I just write a SQL query instead with joining with seeing the internal data representation of the software as an information system instead of bespoke code
This sounds very similar to how CINQ's macro-based implementation performs relational optimizations on top of regular looking Clojure code (whilst sticking to using a single language for everything).
You might be interesting in looking at the Lima programming language: http://btetrud.com/Lima/Lima-Documentation.html . It has ideas that cover some of these things. For example, it's intended to operate with fully automatic optimization. This assumption allows shedding lots of complexity that arises from needing to do the same logical thing in multiple ways that differ in their physical efficiency characteristics. Like instead of having 1000 different tree classes, you have 1 and optimisers can then look at your code and decide what available tree structures make most sense in each place. Related to your async functions idea, it does provide some convenient ways of handling these things. While functions are just normal functions, it has a very easy way to make a block of async (using "thread") and provides means of capturing async errors that result from that.
I'm surprised these are called "programming language ideas". They seem to be solvable, at least many of them, with libraries. For example, my Haskell effect system Bluefin can be seen as a capability system for Haskell. My database library Opaleye is basically a relational query language for Haskell. Maybe I'm short-sighted but I haven't seen the need for a whole new language to support any of that functionality. In fact one gets huge benefits from implementing such things in an existing language.
One advantage (which is touched on in the logging section) is that having it provided by the language makes it clear what the default is, and sets expectations. Essentially, lifting it into the language is a way of coordinating the community.
> Smalltalk and another esoteric programming environment I used for a while called Frontier had an idea of a persistent data store environment. Basically, you could set global.x = 1, shut your program down, and start it up again, and it would still be there.
Frontier! I played with that way back when on the Mac. Fun times.
But as for programming language with integrated database... MUMPS! Basically a whole language and environment (and, in the beginning, operating system) built around a built-in global database. Any variable name prefixed with ^ is global and persistent, with a sparse multi-dimensional array structure to be able to organize and access the variables (e.g. ^PEOPLE(45,"firstname") could be "Matthew" for the first name of person ID 45). Lives on today in a commercial implementation from Intersystems, and a couple Free Software implementations (Reference Standard M, GT.M, and the GT.M fork YottaDB). The seamless global storage is really nice, but the language itself is truly awful.
TADS, an OOP language + VM for interactive fiction, has this "value database" model. Once loaded into memory, the compiled image can be updated with values stored in a separate save file. The compiled image itself could store updated values as well.
In fact, it does this during a "preinit" stage that runs immediately after compilation. Once all preinit code finishes executing, the compiled image is overwritten with the updated state. The language includes a "transient" keyword to permit creating objects that should not be stored.
This same mechanism permits in-memory snapshots, which are used for the game's UNDO feature. No need to rewind or memento-ize operations, just return to a previous state.
It's not a general-purpose mechanism. After all, the language is for building games with multiple player-chosen save files, and to permit restarting the game from a known Turn 0 state.
The MUMPS database is wild. When I was working in MUMPS, it was so easy and fun to whip up an internal tool to share with my coworkers. You don't have to give any special thought at all to persistence, so you're able to stay in the flow of thinking about your business logic.
But as you said, the language itself is almost unbearable to use.
Image persistence was one of the cool ideas of Smalltalk. And in practice, one of the biggest drawbacks. Cruft and old values accumulated steadily, with very little way to find and eliminate them. Transient execution has some cons. But on the pro side, every run starts from a "clean slate."
Save image... for short-term convenience; build clean every week from archived text files.
----
1984 "Smalltalk-80 The Interactive Programming Environment" page 500
"At the outset of a project involving two or more programmers: Do assign a member of the team to be the version manager. … The responsibilities of the version manager consist of collecting and cataloging code files submitted by all members of the team, periodically building a new system image incorporating all submitted code files, and releasing the image for use by the team. The version manager stores the current release and all code files for that release in a central place, allowing team members read access, and disallowing write access for anyone except the version manager."
This may fall in the "you think you do, but you don't category", but I've always wanted a Smalltalk (or similar, not that picky) with a persistent virtual memory.
That is, the VM is mapped to a backing file, changes persisted automatically, no "saving", limited by drive space (which, nowadays, is a lot). But nowadays we also have vast memory space to act as a page cache and working memory.
My contrived fantasy use case was having a simple array name "mail", which an array containing all of my email messages (in email object, of course). Naturally as you get more mail, the array gets longer. Also, as you delete mail, then the array shifts. It's no different, roughly, than the classic mbox format, save it's not just text, its objects.
You can see if you delete a email, from a large (several GBs), there would be a lot of churn. That implies maybe it's not a great idea to use that data structure, but that's not the point. You CAN use that data structure if you like (just like you can use mbox if you like).
Were it to be indexed, that would be done with parallel data structures (trees or hashes or whatever).
But this is all done automagically. Just tweaks to pages in working memory backed by the disk using the virtual memory manager. Lots and lot of potential swapping. C'est la vie, no different from anything else. This what happens when you map 4TB into a 16GB work space.
The problem with such a system, is how fragile is potentially is. Corrupt something and it happily persists that corruption, wrecking the system. You can't reboot to fix it.
Smalltalk suffers from that today. Corrupt the image (oops, did I delete the Object become: method again?), and its gone for good. This is mitigated by having backup images, and the changelist to try to bring you back to the brink but no further.
I'm guessing a way to do that in this system is to use a copy on write facility. Essentially, snapshot the persistent store on each boot (or whatever), and present a list of previous snapshot at start up.
Given the structure of a ST VM you'd like to think this is not that dreadful to work up. I'd like to think a paper napkin implementation PoC would be possible, just to see what it's like. One of those things were the performance isn't really that great, but the modern systems are so fast, we don't really notice it in human terms.
Interesting that E is cited under “capabilities”, but not under “loosen up the functions”. E’s eventual-send RPC model is interesting in a number of ways. If the receiver is local then it works a bit like a JavaScript callback in that there’s an event loop driving execution; if it’s remote then E has a clever “promise pipelining” mechanism that can hide latency. However E didn’t do anything memorable (to me at least!) about handling failure, which was the main point of that heading.
For “capabilities” and “A Language To Encourage Modular Monoliths”, I like the idea of a capability-secure module system. Something like ML’s signatures and functors, but modules can’t import, they only get access to the arguments passed into a functor. Everything is dependency injection. The build system determines which modules are compiled with which dependencies (which functors are passed which arguments).
An existing “semi-dynamic language” is CLOS, the Common Lisp object system. Its metaobject protocol is designed so that there are clear points when defining or altering parts of the object system (classes, methods, etc.) at which the result is compiled, so you know when you pay for being dynamic. It’s an interesting pre-Self design that doesn’t rely on JITs.
WRT “value database”, a friend of mine used to work for a company that had a Lisp-ish image-based geospatial language. They were trying to modernise its foundations by porting to the JVM. He had horror stories about their language’s golden image having primitives whose implementation didn’t correspond to the source, because of decades of mutate-in-place development.
The most common example of the “value database” or image-based style of development is in fact your bog standard SQL database: DDL and stored procedures are very much mutate-in-place development. We avoid the downsides by carefully managing migrations, and most people prefer not to put lots of cleverness into the database. The impedance mismatch between database development by mutate-in-place and non-database development by rebuild and restart is a horribly longstanding problem.
As for “a truly relational language”, at least part of what they want is R style data frames.
Timeouts on calls are, as the OP mentions, a thing in Erlang. Inter-process and inter-computer calls in QNX can optionally time out, and this includes all system calls that can block. Real-time programs use such features. Probably don't want it on more than that. It's like having exceptions raised in things you thought worked.
- Capabilities
They've been tried at the hardware level, and IBM used them in the System/38, but they never caught on. They're not really compatible with C's flat memory model, which is partly they fell out of fashion. Capabilities mean having multiple types of memory. Might come back if partially-shared multiprocessors make a comeback.
- Production-Level Releases
That's kind of vague. Semantic versioning is a related concept. It's more of a tooling thing than a language thing.
- Semi-Dynamic Language
I once proposed this for Python. The idea was that, at some point, the program made a call that told the system "Done initializing". After that point, you couldn't load more code, and some other things that inhibit optimization would be prohibited. At that point, the JIT compiler runs, once. No need for the horrors inside PyPy which deal with cleanup when someone patches one module from another.
Guido didn't like it.
- Value Database
The OP has a good criticism of why this is a bad idea. It's an old idea, mostly from LISP land, where early systems saved the whole LISP environment state. Source control? What's that?
- A Truly Relational Language
Well, in Python, almost everything is a key/value store. The NoSQL people were going in that direction. Then people remembered that you want atomic transactions to keep the database from turning to junk, and mostly backed off from NoSQL where the data matters long-term.
- A Language To Encourage Modular Monoliths
Hm. Needs further development. Yes, we still have trouble putting parts together. There's been real progress. Nobody has to keep rewriting Vol. I of Knuth algorithms in each new project any more. But what's being proposed here?
- Modular Linting
That's mostly a hack for when the original language design was botched. View this from the point of the maintenance programmer - what guarantees apply to this code? What's been prevented from happening? Rust has one linter, and you can add directives in the code which allow exceptions. This allows future maintenance programmers to see what is being allowed.
> But what's being proposed here?
I read it, and to me it seems like they're worried about the wrong things. As I understand it, they're worried about the difficulty and hassle of calling unrelated code in the monolith, and proposing things that would make it easier. But that's wrongheaded. Monoliths don't suffer because it's too hard to reuse functionality. They suffer because it's too easy. Programmers create connections and dependencies that shouldn't exist, and the monolith starts to strangle itself (if you have good tests) or shake itself to pieces (if you don't) because of unnecessary coupling. You need mechanisms that enforce modularity, that force programmers to reuse code at designated, designed module interfaces, not mechanisms that make it easier to call arbitrary code elsewhere in the monolith.
In my opinion, a great deal of the success of microservices is due to the physical impossibility of bypassing a network API and calling the code of another service directly. Because of this, programmers respect the importance of designing and evolving APIs in microservices. Essentially, microservices enforce modularity, forcing programmers to carefully design and evolve the API to their code, and this is such a powerful force for good design that it makes microservices appealing even when their architectural aspects aren't helpful.
A language that made it possible to enforce modularity in a monolith as effectively as it is enforced in microservices would make monoliths a no-brainer when you don't need the architectural aspects of microservices.
C is not really dependent on a flat memory model - instead, it models memory allocations as separate "objects" (quite reniniscent of "object orientation in hardware" which is yet another name for capabilities), and a pointer to "object" A cannot be offset to point into some distinct "object" B.
> A Truly Relational Language
This is broadly speaking how PROLOG and other logic-programming languages work. The foundational operation in such languages is a knowledge-base query, and "relations" are the unifying concept as opposed to functions with predefined inputs and outputs.
(This is one of those times where the C memory model as described in the spec is very different from the mental PDP-11 that C programmers actually use to reason about)
- Everything except C now has standard strings, not just arrays of characters. Almost all languages now have some standard way to do key/value sets. What else ought to be standard?
-- Arrays of more than one dimension would be helpful for numerical work. Most languages descended from C lack this. They only have arrays of arrays. Even Rust lacks it. Proposals run into bikeshedding - some people want rectangular slices out of arrays, which means carrying stride info around.
-- Standard types for 2, 3 and 4-element vectors would help in graphics work. There are too many different implementations of those in most language and too much conversion.
Things to think about:
- Rust's ownership restrictions are harsh. Can we keep the safety and do more?
-- The back-reference problem needs to be solved somehow. Back references can be done with Rc and Weak, but it's clunky.
-- Can what Rust does with Rc, RefCell, and .borrow() be checked at compile time? That allows eliminating the run-time check, and provides assurance that the run-time check won't fail. Something has to look at the entire call tree at compile time, and sometimes it won't be possible to verify this at compile time. But most of the time, it should be.
-- There's a scheme for ownership where there's one owning reference and N using references. The idea is to verify at compile time that the using references cannot outlive the owning one. Then there's no need for reference counds.
-- Can this be extended to the multi-thread case? There have been academic demos of static deadlock detection, but that doesn't seem to have made it into production languages.
-- A common idiom involves things being owned by handles, but also indexed for lookup by various keys. Dropping the handle drops the object and removes it from the indices. Is that a useful general purpose operation? It's one that gets botched rather often.
-- Should compilers have SAT-solver level proof systems built in?
-- Do programs really have to be in monospaced fonts? (Mesa on the Alto used the Bravo word processor as its text editor. Nobody does that any more.)
-- There's async, there are threads, and there are "green threads", such as Go's "goroutines". Where's that going?
-- Can we have programs which run partly in a CPU and partly in a GPU, compiled together with the appropriate consistency checks, so the data structures and calls must match to compile?
-- How about "big objects?" These are separately built program components which have internal state and some protection from their callers. Microsoft OLE did that, some .dll files do that, and Intel used to have rings of protection and call gates to help with that, hardware features nobody used. But languages never directly supported such objects.
So there are a few simple ideas to think about.
> Well, in Python, almost everything is a key/value store.
Why would that be anywhere near an adequate substitute? KV stores are not relational, they don't support relational algebra. KV stores in PLs are common as dirt, so if they were relevant to the question of ending relations in a language I think the author would have noticed.
If anything many "modern" low code SaaS products are much worse in this regard, than what Lisp and Smalltalk have been offering for years.
He proposes that there is a need for a way to connect modules, i.e. dependency injection, without the modules having explicit knowledge of each other, with compile-time verification that the modules being connected are compatible, without the interface song and dance.
About capabilities, I think that capabilities should be a feature of the operating system, although hardware support would be helpful. However, I think that it could be done with tagged memory, without necessarily needing multiple types of memory, and programming languages such as C could still be capable of using them (although some things might not work as it would be expected on other computers, e.g. if you try to copy a reference to a capability into a memory area that is expected to be a number and then try to perform arithmetic on that number, the program is likely to crash even if the result is never dereferenced).
However, my idea also involves "proxy capabilities" too, so that you can effectively make up your own capabilities and other programs receive them without necessarily knowing where they came from (this allows supporting many things, including (but not limited to) many of the idea of "divergent desktop" of Arcan).
Symbolics Genera can save (incremental and complete) images (-> "Worlds"). The image tracks all the sources loaded into it. The sources/files/docs/... of the software is stored on a central (or local) file server.
I can for example start an initial world and load it with all the wanted software in the various versions I want. Maybe I save a new world from that.
I can also start an pre-loaded world and incrementally update the software: write patches, create new minor/major versions, load patches and updates from the central server, install updates from distributions, ... Maybe save new worlds.
The "System Construction Tool" tracks what code is loaded in what version from where.
They simply assert "twiddling a run-time variable for debugging in your staging environment can propagate straight into a bug on production".
As-if straight into production without re-testing.
> Source control? What's that?
"ENVY/Manager augments this model by providing configuration management and version control facilities. All code is stored in a central database rather than in files associated with a particular image. Developers are continuously connected to this database; therefore changes are immediately visible to all developers."
https://www.google.com/books/edition/Mastering_ENVY_Develope...
~
1992 Product Review: Object Technology’s ENVY Developer
http://archive.esug.org/HistoricalDocuments/TheSmalltalkRepo...
Deleted Comment
I found this description amusing because all modern memory safe languages have capabilities, and they all have multiple types of memory: that's what an object is! Memory safety partitions memory into different types, and object references are capabilities!
What languages do next is where they break the security properties of capabilities: they add "ambient authority" and "rights amplification". Quick primer:
Ambient authority is basically the same problem as globally mutable state. Globally mutable state impedes modular reasoning about code, but if that state also carries authority to do something dangerous in the real world, like launch missiles, then it also impedes modular reasoning about security for the exact same reasons.
Rights amplification is the ability to turn a reference to an object with little to no authority, into a reference to an object with more authority. File.Open is the quintessential example, where you can turn an immutable string that conveys no authority, into a file handle to your root file system!
File.Open is also typically accessible to all code, meaning it's also ambient authority. This classical file API is completely bonkers from a security perspective.
So we already have capabilities, what we really need to do is to stop adding all of this insecurity! The developers of the E language actually showed that this could be done by making a capability secure subset of Java called Joe-E, which removed the parts of Java and the standard library that exposed ambient authority or rights amplification patterns. Most Java code could run unmodified.
And as for whether capability security will ever be adopted elsewhere, it already has been! WASM's WASI has capability security in its core design, because capability security is exactly what you need for good isolation and virtualization, which are the core properties WASM needs.
I think squeak had Monticello for source control with their image based approach almost 20+ years ago and there was something else for smalltalk in the '80s too.
But yeah people like text and hate images, and I believe Pharo switched back to some git integration.
"ENVY/Manager augments this model by providing configuration management and version control facilities. All code is stored in a central database rather than in files associated with a particular image. Developers are continuously connected to this database; therefore changes are immediately visible to all developers."
https://www.google.com/books/edition/Mastering_ENVY_Develope...
~
1992 Product Review: Object Technology’s ENVY Developer
http://archive.esug.org/HistoricalDocuments/TheSmalltalkRepo...
I really like what PRQL [1] did, at least it makes table operations easily chainable. Another one that comes to mind is Datomic [2].
[1]: https://prql-lang.org/
[2]: https://docs.datomic.com/peer-tutorial/query-the-data.html
(Funny after years of searching I found somebody who taught me how to do really complex modelling in OWL DL but from reading the literature I'm pretty sure the average PhD or prof in the field has no idea.)
Is there any resource you could recommend for that?
https://www.postgresql.org/docs/current/queries-with.html
"RM: What was key to SQL becoming the standard language for relational databases in the mid- 1980s? Was all down to good marketing?
CJD: In other words, why did SQL became so popular? Especially given all its faults? Well, I think this is rather a sorry story. I said earlier that there has never been a mainstream DBMS product that’s truly relational. So the obvious question is: Why not? And I think a good way for me to answer your questions here is to have a go at answering this latter question in their place, which I’ll do by means of a kind of Q&A dialog. Like this:
Discussed in HN (probably posted many times): https://news.ycombinator.com/item?id=39189015Some of the other points resonate with me. I think sensible dynamic scoping would be an easy way to do dependency injection. Together with something like linear types you could do capabilities pretty smoothly, I think. No real reason why you couldn’t experiment with some persistent storage as one of these dependencies, either. Together with a good JIT story would make for a good, modular environment.
AFAIK it doesn’t have closures (it’s too C-like) so you need to use methods for all your (implicit) interfaces, but that’s okay…
I think the “exemplars” could be automatically yoinked from documentation and tests and existing usage of the function in the code base. Work needs to be done on the IDE front to make this accessible to the user.
I am also agreeing that relational approach to in-memory data is a good, efffective thought.
I recently compiled some of my C code with the sqlite database and I'm preparing to think how the SQL model of my standard code could be used as the actual implementation language of in memory operations.
Instead of writing the hundredth loop through objects I just write a SQL query instead with joining with seeing the internal data representation of the software as an information system instead of bespoke code.
I was hoping to make it possible to handle batches of data and add parallelism because arrays are useful when you want to parallelise.
I was thinking, wouldn't it be good if you could write your SQL queries in advance of the software and then parse them and then compile them to C code (using an unrolled loop of the SQLite VM) so they're performant. (For example, instead of a btree for a regular system operation, you can just use a materialised array a bit like a filesystem so you're not rejoining the same data all the time)
I was thinking of ways of representing actors somehow communicating by tables but I do not have anything concrete for that.
DataDraw is an ultra-fast persistent database for high performance programs written in C. It's so fast that many programs keep all their data in a DataDraw database, even while being manipulated in inner loops of compute intensive applications. Unlike slow SQL databases, DataDraw databases are compiled, and directly link into your C programs. DataDraw databases are resident in memory, making data manipulation even faster than if they were stored in native C data structures (really). Further, they can automatically support infinite undo/redo, greatly simplifying many applications.
> I just write a SQL query instead with joining with seeing the internal data representation of the software as an information system instead of bespoke code
This sounds very similar to how CINQ's macro-based implementation performs relational optimizations on top of regular looking Clojure code (whilst sticking to using a single language for everything).
* https://hackage.haskell.org/package/bluefin
* https://hackage.haskell.org/package/opaleye
Dead Comment
> Smalltalk and another esoteric programming environment I used for a while called Frontier had an idea of a persistent data store environment. Basically, you could set global.x = 1, shut your program down, and start it up again, and it would still be there.
Frontier! I played with that way back when on the Mac. Fun times.
But as for programming language with integrated database... MUMPS! Basically a whole language and environment (and, in the beginning, operating system) built around a built-in global database. Any variable name prefixed with ^ is global and persistent, with a sparse multi-dimensional array structure to be able to organize and access the variables (e.g. ^PEOPLE(45,"firstname") could be "Matthew" for the first name of person ID 45). Lives on today in a commercial implementation from Intersystems, and a couple Free Software implementations (Reference Standard M, GT.M, and the GT.M fork YottaDB). The seamless global storage is really nice, but the language itself is truly awful.
In fact, it does this during a "preinit" stage that runs immediately after compilation. Once all preinit code finishes executing, the compiled image is overwritten with the updated state. The language includes a "transient" keyword to permit creating objects that should not be stored.
This same mechanism permits in-memory snapshots, which are used for the game's UNDO feature. No need to rewind or memento-ize operations, just return to a previous state.
It's not a general-purpose mechanism. After all, the language is for building games with multiple player-chosen save files, and to permit restarting the game from a known Turn 0 state.
But as you said, the language itself is almost unbearable to use.
Save image... for short-term convenience; build clean every week from archived text files.
----
1984 "Smalltalk-80 The Interactive Programming Environment" page 500
"At the outset of a project involving two or more programmers: Do assign a member of the team to be the version manager. … The responsibilities of the version manager consist of collecting and cataloging code files submitted by all members of the team, periodically building a new system image incorporating all submitted code files, and releasing the image for use by the team. The version manager stores the current release and all code files for that release in a central place, allowing team members read access, and disallowing write access for anyone except the version manager."
https://rmod-files.lille.inria.fr/FreeBooks/TheInteractivePr...
The best Smalltalk these days is GlamorousToolkit: https://gtoolkit.com/
It has a sort of git in it, so you can easily "rollback" your image to previous states. So going back and forth in history is trivial.
That is, the VM is mapped to a backing file, changes persisted automatically, no "saving", limited by drive space (which, nowadays, is a lot). But nowadays we also have vast memory space to act as a page cache and working memory.
My contrived fantasy use case was having a simple array name "mail", which an array containing all of my email messages (in email object, of course). Naturally as you get more mail, the array gets longer. Also, as you delete mail, then the array shifts. It's no different, roughly, than the classic mbox format, save it's not just text, its objects.
You can see if you delete a email, from a large (several GBs), there would be a lot of churn. That implies maybe it's not a great idea to use that data structure, but that's not the point. You CAN use that data structure if you like (just like you can use mbox if you like).
Were it to be indexed, that would be done with parallel data structures (trees or hashes or whatever).
But this is all done automagically. Just tweaks to pages in working memory backed by the disk using the virtual memory manager. Lots and lot of potential swapping. C'est la vie, no different from anything else. This what happens when you map 4TB into a 16GB work space.
The problem with such a system, is how fragile is potentially is. Corrupt something and it happily persists that corruption, wrecking the system. You can't reboot to fix it.
Smalltalk suffers from that today. Corrupt the image (oops, did I delete the Object become: method again?), and its gone for good. This is mitigated by having backup images, and the changelist to try to bring you back to the brink but no further.
I'm guessing a way to do that in this system is to use a copy on write facility. Essentially, snapshot the persistent store on each boot (or whatever), and present a list of previous snapshot at start up.
Given the structure of a ST VM you'd like to think this is not that dreadful to work up. I'd like to think a paper napkin implementation PoC would be possible, just to see what it's like. One of those things were the performance isn't really that great, but the modern systems are so fast, we don't really notice it in human terms.
But I do think it would be interesting.
For “capabilities” and “A Language To Encourage Modular Monoliths”, I like the idea of a capability-secure module system. Something like ML’s signatures and functors, but modules can’t import, they only get access to the arguments passed into a functor. Everything is dependency injection. The build system determines which modules are compiled with which dependencies (which functors are passed which arguments).
An existing “semi-dynamic language” is CLOS, the Common Lisp object system. Its metaobject protocol is designed so that there are clear points when defining or altering parts of the object system (classes, methods, etc.) at which the result is compiled, so you know when you pay for being dynamic. It’s an interesting pre-Self design that doesn’t rely on JITs.
WRT “value database”, a friend of mine used to work for a company that had a Lisp-ish image-based geospatial language. They were trying to modernise its foundations by porting to the JVM. He had horror stories about their language’s golden image having primitives whose implementation didn’t correspond to the source, because of decades of mutate-in-place development.
The most common example of the “value database” or image-based style of development is in fact your bog standard SQL database: DDL and stored procedures are very much mutate-in-place development. We avoid the downsides by carefully managing migrations, and most people prefer not to put lots of cleverness into the database. The impedance mismatch between database development by mutate-in-place and non-database development by rebuild and restart is a horribly longstanding problem.
As for “a truly relational language”, at least part of what they want is R style data frames.