Stephen Bourne wanted to write his shell in ALGOL so badly that he relentlessly beat C with its own preprocessor until it began to resemble his preferred language.
Can someone clarify whether this is intended as a joke or whether the author is actually confused? I mean, nothing about this makes sense: it's not "scripting"; it claims to introduce "strong typing" while it does nothing about typing; it introduces all kinds of operator aliases "modeled after Lua and Lisp" that are present in neither of these languages. But it's not an obvious parody either, so I'm genuinely not sure.
I mean he has to be serious, right: "Deprecate Lua, Python, JavaScript, Ruby and a dozen other languages, because Pretty C is the ultimate scripting language, but lightning-fast and strongly typed!!"
Given the idea behind this repo is to cause pain, why not add a shebang to your file [0] to make it executable.
I saw a blog post a long time ago that went into the details of how ./foo worked, and how it executed an elf file. You could register `.c` programs in the same way to be compiled and run?
Now I have a very evil idea: what about registering a binfmt handler for the header bytes “#include”? Sure, it doesn’t handle all C/C++ programs (notably any program that dares to start with a comment), but it would not require modifying any source code!
(For even more insanity I guess you could also trigger on // and /*, although there’s some risk of false positives then!)
Sure, there is no "rule" against it. But words/phrases have commonly-accepted meanings and willfully ignoring or appropriating those meanings implies either cultural ignorance or a concealed agenda.
If you want to insist that scripting languages can be either compiled or interpreted, then its better to just drop it altogether and just say "language" because the "scripting" part has utterly lost its identity at that point.
generally they aren't, as scripting usually implies an interpreter, though no one is stopping you from using a wrapping script that quietly compiles on first run and caches a bunch of executables somewhere. not much different than python producing bytecode files as it goes along.
Well, there's a few things I should probably get around to adding to CNoEvil[0] and ogw[1]... There always seem to be more every few months when this project reappears.
What do you consider the type of shell text, i.e. what's in argv and what you get from subprocess output? It's not well-formed utf8 strings because any random garbage can be in there, yet tools like awk and grep are ubiquitous.
I'd argue that strings and bytes are the same general type, but it's sometimes useful to give well-formed utf8 bytes a different type internally. Rust gets this mostly correct with OsString and String.
The way I understand it: Bytes are just bytes, until you provide an encoding. Then they can be can be converted to a string, if validly encoded. Taking an array of characters and just treating it or casting it as a string is usually a bad idea.
The thing I think Rust maybe goofed, or at least made a little complicated, is their weird distinction between a String and a str (and a &str). As a newbie learning the language, I have no idea which one to use, and usually just pick one, try to compile, then if it fails, pick the other one. I'm sure there was a great reason to have two types for the same thing, that I will understand when I know the language better.
What you see on the screen of a terminal is Unicode strings. It is human readable text. len(“”) is 3 even if the underlying encoding holds it as 6 bytes.
Of course if you provide a separate set of functions for treating a string as human readable vs not you can also work with that. Basically len() vs byte_len().
But you can’t concat two human readable strings without ensuring they are of the same encoding. You can’t search a string by bytes if your needle is of a different encoding. You can’t sort without taking encoding and locale preferences into account, etc.
Pretending like you don’t care about encoding doesn’t work as we have seen time and again.
Given the nature of it (pretty.c) and the stated intention of being "backwards-compatible with C and all of its libraries", what would make more sense than sticking with C's multibyte strings?
I don't agree. This doctrine presumes all of the following:
- String data will be properly encoded
- There is one encoding of strings (UTF-8 usually)
- Validation must occur when string data is created
- Truncating a logical codepoint is never acceptable
- You may not do string things to "invalid" bytes
- Proper encoding is the beginning and the end of validation
None of these things are consistently true. It's a useful practice to wrap validated byte sequences in a type which can only be created by validation, and once you're doing that, `Utf8String` and `EmailAddress` are basically the same thing, there's no reason to privilege the encoding in the type system.
Reminds me of a C++ codebase I once had to inspect that was entirely written as if it were written in Java. With camelcase naming for everything, getters and setters for every class variable, interfaces everywhere.
> camelcase naming for everything, getters and setters for every class variable, interfaces everywhere
This is not far off from the guidelines in many cases, e.g. Windows code (well, not every variable of course.) A lot of Java design was copied from C++.
I've seen similar codebases as well written by people who have spent way too much time with Java. One even had it's own String class which was just a wrapper for std::string with Java-like methods.
Stephen Bourne wanted to write his shell in ALGOL so badly that he relentlessly beat C with its own preprocessor until it began to resemble his preferred language.
https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh...
https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh...
https://www.ioccc.org/
I saw a blog post a long time ago that went into the details of how ./foo worked, and how it executed an elf file. You could register `.c` programs in the same way to be compiled and run?
[0] https://gist.github.com/jdarpinian/1952a58b823222627cc1a8b83...
(For even more insanity I guess you could also trigger on // and /*, although there’s some risk of false positives then!)
[0] https://bellard.org/tcc/tcc-doc.html
If you want to insist that scripting languages can be either compiled or interpreted, then its better to just drop it altogether and just say "language" because the "scripting" part has utterly lost its identity at that point.
Haha love this!
[0] https://git.sr.ht/~shakna/cnoevil3/
[1] https://git.sr.ht/~shakna/ogw
I love this to the very core of my being.
I'd argue that strings and bytes are the same general type, but it's sometimes useful to give well-formed utf8 bytes a different type internally. Rust gets this mostly correct with OsString and String.
The thing I think Rust maybe goofed, or at least made a little complicated, is their weird distinction between a String and a str (and a &str). As a newbie learning the language, I have no idea which one to use, and usually just pick one, try to compile, then if it fails, pick the other one. I'm sure there was a great reason to have two types for the same thing, that I will understand when I know the language better.
Of course if you provide a separate set of functions for treating a string as human readable vs not you can also work with that. Basically len() vs byte_len().
But you can’t concat two human readable strings without ensuring they are of the same encoding. You can’t search a string by bytes if your needle is of a different encoding. You can’t sort without taking encoding and locale preferences into account, etc.
Pretending like you don’t care about encoding doesn’t work as we have seen time and again.
https://en.cppreference.com/w/c/string/multibyte
If it's "human-readable text", then fine, a string is not the same thing as an arbitrary byte array.
But lots of languages don't enforce that definition.
wow.
thanks for this gem.
Before he wrote the Bourne shell the author wrote an ALGOL compiler, and ALGOL inspired Bourne syntax:
https://en.wikipedia.org/wiki/ALGOL_68C
This is not far off from the guidelines in many cases, e.g. Windows code (well, not every variable of course.) A lot of Java design was copied from C++.
https://learn.microsoft.com/en-us/cpp/cpp/property-cpp?view=...