Time to get the Posix elephant off our necks?

This article is just confused and wrong. Some examples:

"The Socket API, local IPC, and shared memory pretty much assume two programs"

There's no truth to this whatsoever - it's an utterly absurd statement because:

* The socket API as applied to networking is empirically designed for arbitrary numbers of peers -- there are unlimited real world examples. The socket API as applied to local IPC is of course similarly capable.

* Shared memory - mmap() is about as generic of a memory sharing interface as one could hope for. It certainly works just fine with arbitrary numbers of clients. There are again countless examples of using shared memory with arbitrary participants -- for example all the various local file databases: sqlite, leveldb, berkeley db, etc.

"We should be able to assume that the data we want exists in main memory without having to keep telling the system to load more of it."

Yes, this is exactly what mmap() already does. mmap() is part of POSIX. Use it!

"Kode Vicious, known to mere mortals as George V. Neville-Neil,"

Oh, ok.

geocar · 3 years ago

> * The socket API as applied to networking is empirically designed for arbitrary numbers of peers -- there are unlimited real world examples. The socket API as applied to local IPC is of course similarly capable.

I think "the [BSD] socket API" was designed for a thousand or so peers (FD_SETSIZE). That's why the API evolved. Now (2023) there are a lot of APIs and different choices you can make (blocking, polling readiness, sigio, aio, iocp, iouring), none of which is best at everything, but the non-POSIX apis are much faster than the POSIX api -- especially the ones that are harder to mix with the POSIX api.

> "We should be able to assume that the data we want exists in main memory without having to keep telling the system to load more of it."

> Yes, this is exactly what mmap() already does. mmap() is part of POSIX. Use it!

This is not what mmap() does, but the opposite. mmap() sets up page-tables for demand-paging. Those little page-faults actually trigger a check (demand) to see if the page is "in memory" (this is what Linux calls the "block cache"), an update to the page-tables to point to where it is and returns (or kick off a read IO operation to the backing store). These page-faults add up. What the author is looking for in Linux is mremap() and Darwin is mach_vm_remap() and definitely not in POSIX.

> * Shared memory - mmap() is about as generic of a memory sharing interface as one could hope for. It certainly works just fine with arbitrary numbers of clients. There are again countless examples of using shared memory with arbitrary participants -- for example all the various local file databases: sqlite, leveldb, berkeley db, etc.

But none of those "local file databases" can handle hundreds of thousands of clients, and aren't great even for the low hundreds(!), that's why most "big" database vendors avoid "just" using mmap(), and instead have to perform complex contortions involving things like mach_vm_remap/mremap/O_DIRECT/sigaction+SIGSEGV. userfaultfd and memfd are other examples of recent evolutions in these APIs.

You are looking for truth? Evolving APIs are evidence that people are unhappy with these interfaces, and these newer APIs are better at some things than the old, demonstrating that the old APIs are not ideal.

So we have evidence our (programming) model is not ideal, are we to be like Copernicus and look for a better model (better APIs)? Or are we to emulate Tolosani and Ingoli?

AlexandrB · 3 years ago

> You are looking for truth? Evolving APIs are evidence that people are unhappy with these interfaces,

That tracks, but this does not necessarily follow:

> demonstrating that the old APIs are not ideal.

I've been around the tech industry long enough to see several cycles of practitioners going back and forth between technologies and approaches. E.g. strong type systems (C++, Java) -> loose type systems (Python, JS) -> strong type systems (Rust, TypeScript). And each time the tide shifts there are always plenty of arguments trying to show why the previous approach was objectively worse and the new one is better.

That's not to say that nothing is moving forward, but the fact that people are unhappy with a technology doesn't mean that technology is objectively inferior to a proposed replacement. Sometimes it's just fashion.

throwaway09223 · 3 years ago

" think "the [BSD] socket API" was designed for a thousand or so peers "

I think a more accurate framing is that in the 80s POSIX was designed for a thousand or so peers, and later versions have expanded and scaled accordingly. Literally every interface has these types of growth patterns so this shouldn't be a surprise.

But your point, which is the same point I made below, is that POSIX is but one of several interfaces on a modern system: https://news.ycombinator.com/item?id=34905405

As I noted below, we have semi-standardized extensions from POSIX. We don't really need POSIX to codify a new event system because we have libev/libevent. This is another point the parent article gets wrong when it says we need to get POSIX "off our necks." Nowhere does POSIX compliance stand in the way of using libev, or even using mremap(). I don't agree with your conclusion around mremap() above but it doesn't matter because the salient point is that mremap() works absolutely fine within the paradigm offered by mmap() and POSIX. libev works just fine within the paradigm offered by the file/socket api.

"But none of those "local file databases" can handle hundreds of thousands of clients"

This is absolutely not true. Those systems work just fine with even a million reader threads.

Maybe you mean the databases don't perform well under parallel writes, but this has nothing to do with the shared memory interface (how could it?). It's purely due to the design and intent of each system as primarily read-focused systems. mmap will work just as well as any other shared memory interface when it comes to reading and writing. Pick nearly any write-performant database - it's using mmap.

"Evolving APIs are evidence that people are unhappy with these interfaces"

Not quite. The actual truth is that POSIX plays very well with extensions to its interfaces and this is yet another dimension in which the above article is demonstrated to be very, very silly.

dcow · 3 years ago

Of course we should be like Copernicus.

So POSIX as a technology may be lacking but honestly what’s impressive about it to me is that it’s also a standard, not perfect there either, of course.

I think any conversation about replacing POSIX must include an effort to update or replace the standard. Deprecate the parts that no longer work well and add things that other systems have de-facto agreed upon, at least.

cvccvroomvroom · 3 years ago

Kids today: want to come along, throw away everything they don't understand, and rebuild what was already there but worse. Accomplishing negative impact.

jplona · 3 years ago

https://fs.blog/chestertons-fence/

I think this is a human thing, not necessarily just kids today.

loeg · 3 years ago

It's probably worth noting that the author is in his 50s and is fairly familiar with POSIX (he has been involved in the FreeBSD developer community for decades).

DangitBobby · 3 years ago

It's fun to build stuff. It's not nearly as fun to use and maintain what other people have built. Why should all the fun be for the people who just so happen to have been born decades earlier? The point of life is not to maximize efficiency in all respects, and people make emotional choices to achieve personal fulfillment.

simplotek · 3 years ago

> Accomplishing negative impact.

This opinion piece is the epitome of a negative impact work. It's easy to accuse random things of being all wrong, but it's far harder to actually present something you feel is right.

IgorPartola · 3 years ago

Do you really have to call out the entire JavaScript ecosystem like that? :)

rstat1 · 3 years ago

and the old folk that created all these things, don't bother to explain them, then get uppity when someone new comes along and doesn't think their creations are perfect.

2 sides of the same coin.

goodpoint · 3 years ago

That's been the default for the whole software industry for the last 20 years at least.

Reinventing things without learning from the past - and if you point it out people gets so defensive.

endgame · 3 years ago

https://www.jwz.org/doc/cadt.html

Deleted Comment

thefz · 3 years ago

While complaining that Git is haaaard.

loeg · 3 years ago

Yeah, KV is GNN's vehicle for excessively controversial (i.e., what I might call "bad") takes.

I don't think he's wrong that the POSIX synchronous IO model isn't a great fit for modern hardware, though. He doesn't really go into it much but (Windows) NT's async-everything IOCP model really seems to be the winner, as far as generic abstractions that have stood the test of time.

astrange · 3 years ago

Concurrency is hard. If you don't have anything better to do but wait on a single file operation (which, in a CLI tool, you might not), then a synchronous call is just fine. If you do have multiple I/O operations to issue at once, that can still be synchronous with writev().

Most forms of async programming are also unstructured, which is bad for correctness, but also for performance since it can lead to priority inversions.

throwaway09223 · 3 years ago

Yeah, I agree that's an area that needs work. I admit I didn't bother to read every word, but I did search for "async" to see if he mentions aio(7) to complain about it and I saw that he didn't.

Don't get me wrong, there are lots of room for improvements in POSIX IO interfaces. For example, POSIX doesn't define any modern event systems (epoll, kqueue, etc). But what's the result? We use libevent/libev.

This sort of seems like the result he's asking for in diverging from POSIX, which we ... already have. There are a lot of pretty great non-posix standard interfaces available too!

simplotek · 3 years ago

> I don't think he's wrong that the POSIX synchronous IO model isn't a great fit for modern hardware, though.

Opinions don't really matter. Solutions to real-world problems do.

To me, the fact that no one up to this day felt that the issues in this area was significant enough to warrant a fix or an alternative tells me that it's noise about nothing.

wruza · 3 years ago

NT's async-everything IOCP model really seems to be the winner

The author talks about plumbing to data processing ratio. Let’s take hundred average programmers, let them plumb IOCP and see how many of them can even get to the “data” part.

jasmer · 3 years ago

I really disagree with the sockets bit. Sockets are designed for networking, with a particular focus on IP. Not IPC.

I for one think that the 'absurd thing' is that IPC is not built into OS as a core feature. That, and process isolation. Both sockets and shared memory are quite problematic and the challenge of 'true IPC' that works nicely with threads etc. is real.

anon291 · 3 years ago

> IPC is not built into OS as a core feature

POSIX and its derivatives build IPC into the OS as a core feature. In particular, POSIX is built around memory mappings and file descriptor inheritance, which means it is extraordinarily easy to make processes communicate.

I honestly have no idea what you mean by this statement.

Unix domain sockets are (1) fast, (2) primitive (just a file), (3) widely available, and (4) can be used between multiple processes extremely easily (use SOCK_DGRAM).

syrrim · 3 years ago

unix domain sockets are designed to be used entirely locally.

anon291 · 3 years ago

Completely agree. In particular, POSIX is built around the inheritance of file descriptors by their children, which means that it is extraordinarily easy to have sockets going between multiple processes. Moreover, it's entirely possible to send file descriptors over other sockets (SCM_RIGHTS). POSIX has robust IPC. I'm currently messing around with IPC on Windows... and wow, at the end of the day, even in 2023, UNIX et al are simply more advanced than Windows. It's unfortunate there's been absolutely zero groundbreaking discoveries or inventions in this field (OS dev), but the idea that we should throw away the state of the art simply because is just silly.

POSIX IPC has withstood the test of time. There is no other system that offers as rich a set of primitives.

jcrites · 3 years ago

Can you be more specific about how POSIX is more advanced than Windows (NT kernel)? Or a primitive it offers that NT doesn’t? Examples?

Windows provides most (all?) of the same mechanisms, and a number that POSIX does not.

https://learn.microsoft.com/en-us/windows/win32/ipc/interpro...

Example Windows concepts that Unix doesn’t have (AFAIK):

Transactions on named pipes (request/response): https://learn.microsoft.com/en-us/windows/win32/ipc/transact...

Parents can control which handles child processes inherit: https://learn.microsoft.com/en-us/windows/win32/ipc/pipe-han...

Remote Procedure Call (RPC) IPC: https://learn.microsoft.com/en-us/windows/win32/ipc/interpro...

ACL-based permissions on objects that kernel APIs operate on, including pipes. (ACLs are a much more sensible way of specifying permissions on resources than owner/group). https://learn.microsoft.com/en-us/windows/desktop/SecAuthZ/a...

Since windowing is built into the NT kernel, it also has support for powerful clipboard operations as one crucial type of IPC in a graphical environment (see 1st link).

Windows I/O Completion Ports (IOCP) also provide a high-performance way to implement kernel-managed asynchronous I/O operations which has no parallel in POSIX: https://learn.microsoft.com/en-us/windows/win32/fileio/i-o-c...

I’ve been developing professionally on Linux for >15 years, and while I do like its simple aesthetic, the consistency and power of NT kernel APIs are something I miss.

pjmlp · 3 years ago

Sure there are, one only needs to actually bother to learn about the history of computing and other platforms, to validate that.

Animats · 3 years ago

There are some legitimate issues here, and some ranting.

First, memory models. The author seems to be arguing for some way to talk about data independent of where it's stored. The general idea is that data is addressed not with some big integer address, but with something that looks more like a pathname. That's been tried, from Burroughs systems to LISP machines to the IBM System 38, but never really caught on.

All of those systems date from the era when disks were orders of magnitude slower than main memory, and loading, or page faulting, took milliseconds. Now that there are non-volatile devices maybe 10x slower than main memory, architectures like that may be worth looking at again. Intel tried with their Optane products, which were discontinued last year. It can certainly be done, but it does not currently sell.

The elephant in the room on this is not POSIX. It's C. C assumes that all data is represented by a unique integer, a "pointer". Trying to use C with a machine that does not support a flat memory model is all uphill.

Second, interprocess communication. Now, this is a Unix/Linux/Posix problem. Unix started out with almost no interprocess communication other than pipes, and has improved only slightly since. System V type IPC came and went. QNX type interprocess calls came and went. Mach type interprocess calls came and went. Now we have Android-type shared memory support.

Multiple threads on multiple CPUS in the same address space work fine, if you work in a language that supports it well. Go and Rust do; most other popular languages are terrible at it. They're either unsafe, or slow at locking, or both.

Partially shared memory multiprocessor are quite buildable but tough to program. The PS3's Cell worked that way. That was so hard to program that their games were a year late. The PS4 went back to a vanilla architecture. Some supercomputers use partially shared memory, but I'm not familiar with that space.

So those are the two big problems. So far, nobody has come up with a solution to them good enough to displace vanilla flat shared memory. Both require drastically different software, so there has to be a big improvement. We might see that from the machine learning community, which runs relatively simple code on huge amounts of data. But what they need looks more like a GPU-type engine with huge numbers of specialized compute units.

Related to this is the DLL problem. Originally, DLLs were just a way of storing shared code. But they turned into a kind of big object, with an API and state of their own. DLLs often ought to be in a different protection domain than the caller, but they rarely are. 32-bit x86 machines had hardware support, "call gates", for that sort of thing, but it was rarely used. Call gates and rings of protection have mostly died out.

That's sort of where we are in architecture. The only mainstream thing that's come along since big flat memory machines is the GPU.

ayende · 3 years ago

Just for refence. C and non flat memory model used to be REALLY common. It was called MS-DOS, and Win16 https://stackoverflow.com/questions/8727122/explain-the-diff...

Animats · 3 years ago

Few if any programs actually used segmented memory in MS-DOS as segmented memory. C compilers of the era combined both fields into 20-bit pointers. Pointer arithmetic was kind of messy but had compiler support.

atrettel · 3 years ago

> Some supercomputers use partially shared memory, but I'm not familiar with that space.

Some supercomputers do have some shared memory architecture, but a whole lot more use Message Passing Interface (MPI) for distributed memory architecture. Shared memory starts to make less sense when your data can be terabytes or larger in size. It is a lot more scalable to just avoid a shared memory architecture and assume a distributed memory one. It becomes easier to program assuming that each thread just does not have access to the entire data set and has to send data back and forth between threads (pass messages).

pinewurst · 3 years ago

As there’s nothing new under the sun, those call gates and rings came from Multics.

phkahler · 3 years ago

>> Multiple threads on multiple CPUS in the same address space work fine, if you work in a language that supports it well. Go and Rust do; most other popular languages are terrible at it.

I find OpenMP for C and C++ to be simple and effective. You do have to write functions that are safe to run in parallel, and Rust will help enforce that. But you can write pure function in C++ too and dropping a #pragma to use all your cores is trivial after that.

rcme · 3 years ago

So this author is (rightfully) getting a lot of hate. But, "What if we replaced POSIX?" is an interesting question to me. Most people interact with POSIX through the "good parts." But, once you need to write C, which happens when you need to make low-level calls to the OS, it starts to get a little annoying. The biggest annoyance is related to memory management. A lot of the OS APIs have manual free functions, e.g. you call `create_some_os_struct` and then `free_some_os_struct`. Similarly, you end up writing a lot of your own call/free functions. This is because you need to write C to talk to the OS, but your other code is probably not in C and may not have access to the same libc your C uses. So, you need to provide an escape hatch back into your C code to free any allocated memory.

Another annoyance is that passing data between C and another language is hard. For instance, if you want to pass a Swift string in to C, you need to be careful that Swift doesn't free the string while C is using it. The "solution" to this is to have explicit methods in Swift that take a closure which guarantee the data stays alive for the duration of that closure. On the C side, you need to copy the data so that Swift can free the string if you need to keep it longer than that one block. Going from C to Swift is also a pain.

A cool thought is: what if the OS provided better memory management? What if it had a type of higher level primitive so that memory could be retained across languages? For instance, if I pass a string from C to Go, why do I need to copy it on the Go side? Why can I not ask the OS to retain the memory for me? Perhaps we need retain / release instead of malloc and free. Anyway, just a random thought.

tsimionescu · 3 years ago

The problem with this thought is that malloc/free are not OS primitives, they are strictly concepts that make sense to your own program. Languages like Swift and Go never use these calls at all, for example. When Swift "frees" a string that was still being referenced from C, it's very likely not the OS that will mess with it, but other parts of the Swift program.

The way programs actually interact with the OS for memory allocation is using sbrk() or mmap() (or VirtualAlloc() in the case of Windows) to get a larger piece of memory, and then managing themselves at the process level.

And having the OS expose a more advanced memory management subsystem is a no-go in practice because each language has its own notions of what capabilites are needed.

drjasonharrison · 3 years ago

In my opinion, given that the C-ABI is pretty much the only cross language interface for transferring in memory data between components written in different languages, it is a pretty big blind spot not include into a new language a C-safe way of transferring memory.

Yes, you can open a socket or file and transfer your data that way, so someone had to then implement for the language the memory data protection mechanisms. Just expose them!

anon291 · 3 years ago

So POSIX is both the libraries as well as the general system design. If you want to eschew all the POSIX libraries on most *NIX systems today (at least the open source ones), you can simply ... do that. In particular, the Linux kernel (and the BSDs are similar) make no assumption as to how you're managing user memory. You can call mmap to map pages and allocate memory as you like.

In fact, languages such as Go widely disregard libc (IIUC) and just roll their own thing. They still benefit from the POSIX semantics built in to the kernels that go programs run on.

At the end of the day, the main interface between POSIX kernels and userspace is a 32-bit integer (the file descriptor).

justin66 · 3 years ago

> So this author is (rightfully) getting a lot of hate.

Precisely what about that seems right to you?

rcme · 3 years ago

The letter isn't coherent, and I think people are reacting to that. The thesis of the letter is that "POSIX is the reason your code has annoying low-level plumbing and knobs needing attention," but doesn't explain how removing POSIX will help with that stuff. E.g. "new schedulers have to be built to handle the fact that memory is not all one thing." What does that even mean? Maybe he has a vision in his head, but it's not well articulated.

Also, the letter ends with this:

> If we are to write programs for such machines, it is imperative to get the Posix elephant off our necks and create systems that express in software the richness of modern hardware.

But such a system could still have annoying low-level knobs that need turning.

jsjohns2 · 3 years ago

To those bashing the author as uninformed -- this is George V. Neville-Neil. Member of FreeBSD Core Team who wrote the book on FreeBSD. He might know a thing or two about POSIX! [1]

[1] https://www.amazon.com/Design-Implementation-FreeBSD-Operati...

scottlamb · 3 years ago

It's a bad article because it's too vague and doesn't clearly relate to the questioner's problem, not because the author doesn't have the proper pedigree.

One could certainly write good articles about why the POSIX API is too limiting. For example: the filesystem API is awful in many ways. I'll try to be a bit more specific (despite having only a few minutes to write this):

* AFAICT, it has very few documented guarantees. It doesn't say sector writes are atomic, which would be very useful [1]. (Or even that they are linear as described in that SQLite page, but the SQLite people assume it anyway, and they're cautious folks, so that's saying a lot.) And even the ones that I think its language guarantees, like fsync guaranteeing all previously written data to that file has reached permanent storage, systems such as Linux [2] and macoS have failed to provide. [3]

* It doesn't provide a good async API. io_uring is my first real hope for this but isn't in POSIX.

* IO operations are typically uninterruptible (with NFS while using a particular mount operation as a rare exception). Among other problems, it means that a process that accesses a bad sector will get stuck until reboot!

* It doesn't have a way to plumb through properties you'd want for a distributed filesystem, such as deadlines and trace ids.

* It provides just numeric error codes, when I'd like to get much richer stuff back. Lots of stuff in distributed filesystem cases. Even in local cases, something like how where in particular path traversal failed. I actually saw once (but can't find in a very quick search attempt) a library that attempted to explain POSIX errors after the fact, by doing a bunch of additional operations after the fact to narrow it down. Besides being inherently racy, it just shouldn't be necessary. We should get good error messages by default.

[1] https://www.sqlite.org/atomiccommit.html

[2] https://wiki.postgresql.org/wiki/Fsync_Errors

[3] https://developer.apple.com/library/archive/documentation/Sy...

drpixie · 3 years ago

It's a great article, and it raises many major issues with our current model of computing. But it's obviously triggering, and lots of people are rushing to defend their comfort zone.

Think outside the box people ... "files", what a charming but antiquated concept; "processes" and thus "IPC", how quaint!

jeroenhd · 3 years ago

He also worked on VxWorks, an operating system that is decidedly non-POSIX!

foxhill · 3 years ago

- a windows-first programmer who sees interoperability and composition as an encumbrance that satisfies “nerds who like to do weird shit that i don’t understand in bash”

reading from stdin isn’t challenging, nor writing to stdout. if someone can’t imagine why that might be useful, then i’d argue their journey as a software engineer is either at its end, or right at its beginning.

at the cost of potentially sounding inflammatory, “get good”.

jeroenhd · 3 years ago

> reading from stdin isn’t challenging, nor writing to stdout

Every time I dabble in C, I need to look up what method I need to use these days. getline? scanf? Do I need to allocate a buffer? What about freeing it, is it safe to do so from another thread? What about Unicode support, can I just use a char array or do I need a string library for proper support? What's a wchar_t again and why is it listed in this example I found online? How do I use strtok to parse a string again?

Sure, these things become trivial with experience, but they're not easy. Other languages make them easier so we know it can be done, yet the POSIX APIs insist on using the more difficult version of everything for the sake of compatibility and programmer choice.

(Modern) C++ makes the entire process easier but there are still archaic leftovers you need to deal with on nix if you want to interact with APIs outside what the C++ standard provides. At that point, you're back to POSIX APIs and nix magic file paths with *nix IOCTLs. Gone are your exceptions, your unique_ptrs, and your std::string, back are errno and pointers.

ddulaney · 3 years ago

> reading from stdin isn’t challenging, nor writing to stdout.

Unless you're Kernighan and Ritchie, who semi-famously wrote a buggy hello world program and used it to educate a generation of C programmers: https://blog.sunfishcode.online/bugs-in-hello-world/

Obviously an exaggeration, but when's the last time you checked the return value of printf? I know I don't. And that's not even a memory safety bug, just basic logic. I hope nobody trusts those guys around malloc and free :)

gjm11 · 3 years ago

The author, George V Neville-Neil, may be right or wrong but definitely isn't a Windows-first programmer with no understanding of Unix-like systems.

He cowrote a book about the innards of FreeBSD: https://www.oreilly.com/library/view/design-and-implementati... (ignore the first paragraph of the description, which is presumably a copy-and-paste error).

He has been on the FreeBSD Board of Directors: https://freebsdfoundation.org/blog/george-neville-neil-joins...

He's presented a bunch of papers at FreeBSD conferences: https://papers.freebsd.org/author/george-neville-neil/

All of which is perfectly compatible with his being incompetent, or wrong about this in particular. (For what it's worth, I don't think he is incompetent.) But what he easily demonstrably isn't is "Windows-first", and I suggest that any mental process that led you to that conclusion needs reexamining.

nixpulvis · 3 years ago

I'm confused as to what exactly is wrong with the notion of _jobs_ and _files_? Scheduling is hard, but modern operating systems are definitely setup to do it. I think we could probably do a better job of using realtime features and maintaining update latency benchmarks, but so many of the cycles on my PCs/mobiles are wasted doing god damned animations and updating the screen without interaction that I don't think this is really the main issue.

Programming is basically always just a matter of loading data, transforming data, and then putting it somewhere. The simplest record keeping systems do that, and the fanciest search algorithms do that. Decode/encode/repeat.

EDIT: The beauty of UNIX to me is the interoperability of the text stream. Small components working together. Darwinesque survival of the fittest command.

quanticle · 3 years ago

    The beauty of UNIX to me is the interoperability of the text stream

What interoperability? Look at the man page for any simple Unix utility (such as `ls`), and count up how many of the listed command line flags are there only to structure the text stream for some other program. "Plain text" is just as interoperable as plain binary. "Just use plain text" is the Original Sin of Unix.

The Unix Philosophy, as stated by Peter Salus [1] is

1. Write programs that do one thing and do it well

2. Write programs that work together

3. Write programs to handle text streams because text is a universal interface.

The problem is that, in practice, you can only pick two of those. If you want to write programs that work together, and do so using plain text, then, in addition to doing its ostensible task, each program is going to have to provide a facility to format its text for other programs, and have a parser to read the input that other programs provide, contradicting the dictum to "do one thing and do it well".

If you want programs that do one thing and do it well, and programs that work together, then you have to abandon "plain text", and enforce some kind of common data format that programs are required to read and output. It might be JSON. Or it might be some kind of binary format (like what PowerShell uses). But there has to be some kind of structure that allows programs to interchange data without each program having to deal with the M x N problem of having to deal with every other program's idiosyncratic "plain text" output format.

[1]: https://en.wikipedia.org/wiki/Unix_philosophy

infogulch · 3 years ago

Programmers tend to have a disproportionate affinity towards plain text, me included. But this is an intriguing argument so now I'm reconsidering. Maybe plain text is just someone else's unparsed junk.

jimbo9991 · 3 years ago

This was a really insightful take on the Unix philosophy that I hadn't heard before but I intuitively agree with because of all the parsing code I've had to write.

chubot · 3 years ago

The M x N interoperability problem is SOLVED BY building on top of bytes / plain text, not solved by moving AWAY from it!

See this section of my follow-up post (to the link below, which is how I found this post):

https://www.oilshell.org/blog/2022/03/backlog-arch.html#slog...

There have been numerous projects which invent bespoke protocols for interoperability -- I give the examples of PowerShell, Elvish, and nushell.

(PowerShell doesn't use any kind of binary format AFAIK. I believe you are literally moving around .NET objects inside the CLR VM, and that representation is meaningless outside the CLR VM. This is crucial because it means that PowerShell must serialize its data structures for interoperability.)

As well as various Lisps.

The argument is how you interoperate between them. (Honest question -- please let me know.)

So ironically, trying to solve the interoperability problem in a smaller context CREATES it again in a bigger context (e.g. between different machines).

Bytes and text are fundamental because they reflect how disks and networks fundamentally work, in addition to operating system.

That does not mean we shouldn't have higher level layers on top of bytes and text, like JSON, HTML/XML, and TSV/CSV.

Those are structured data formats. You generally use parsing libraries for them, instead of writing the parser yourself.

Again, all of those formats ARE text, and that's a feature, not a bug!

deafpolygon · 3 years ago

The only thing I really take away from the UNIX philosophy nowadays (I used to be a dyed in the wool fan of UNIX/Linux) is #1) do one thing and do it well. I see #2 as an ideal goal to reach but not always required. And #3 is nowadays untenable for me. If we can agree on an object exchange format (something PowerShell seem to have solved in part), then we can do much much more than relying on text streams.

cvccvroomvroom · 3 years ago

Maybe it was written as a parody and we just don't get the joke? (Not POSIX, the article. POSIX survives fine outside of UNIX in embedded systems and somewhat on Mac OS and Windows.)

olliej · 3 years ago

macOS at least is definitely posix compliant - not just somewhat. It was a big part of what was needed for xserves to be competitive (alas being shiny and aluminium was not :) ). While xserves have died, macOS remains entirely posix compliant - complete with the horrific %n format specifier (although an environment variable is needed for it to be respected in non read only format strings iirc)

astrange · 3 years ago

The idea that your computer would be faster or easier to use if it didn't have any animations seems untrue. Providing physicality is good! Helps you understand how things are changing between two different states.

nine_k · 3 years ago

Plan 9 enhances all of these good Unix traits. Even the universal text stream: it adds support for arrays / lists, that is, streams with elements larger that one byte.

shiftoutbox · 3 years ago

Gnn once told me , about how he had to work on plan9 . It’s an interesting topic . It’s good to see how other people think about this cs topic and see if you can borrow some ideas .

PaulDavisThe1st · 3 years ago

It is hard to believe that this bunch of drivel was actually available from acm.org. If a 16 year old programmer came to me with this nonsense, I might take the time to gently point them in a few directions. Being on the acm.org site ... unforgivable.

And not even the obvious faults, pointed out by others here. There's the question of an apparent complete ignorance of OS research, for example systems that rely on h/w memory protection and so can put all tasks (and threads) into a single address space. But what's the actual take home from such research? The take home is that these ideas have, generally speaking, not succeeded, and that to whatever extent they do get adopted, it is incremental and often very partial. If you don't understand why most computing devices today do not run kernels or applications that look anything like the design dreams of 1990-2010 (to pick an arbitrary period, but a useful one), then you really don't understand enough about computers to even write a useless article like this one.

Blackthorn · 3 years ago

That's a pretty bizarre answer to what was a pretty reasonable question. I don't even see how the question and answer are related honestly. Surely the question is more about finding a better storage format for the initial ingestion or data storage and has little to nothing to do with POSIX.

Most languages have a way to slurp a file into memory in a single function call, after all. The fact that files exist shouldn't be a barrier here.

UncleEntity · 3 years ago

They’re using python and C/C++ to speed up the slow bits. Can’t imagine anything more portable than that without even having to know what posix is doing under the library abstractions.

galaxyLogic · 3 years ago

When I was programming with Java-EE I was surprised how much effort had to be put in to produce the production artifacts: .Wars and .Ears and so on. I had to use a language called Ant. I also had to write Java, JavaScript, CSS and HTML.

On the C-side it is "make" or something similar. It means you must master multiple languages, some of which may be statically typed while others are not.

I assume integrating C with Python is similarly lots of overhead which in principle you shouldn't have to do. Why can't I just write everything in a single language and be done with it? Why do I have to write a program that transforms my source-code modules into an executable?