Why did base64 win against uuencode?

Having lived through the transition, I can say personally it comes down to "packaging" -if MIME had adopted UUENCODE format, I probably would have used it but as materials emerged to me which depended on base64 decode, it became compelling to use it. Once it was ubiquitously available in e.g ssl, it became trivial to decode a base64 encoded thing, no matter what. Not all systems had a functioning uudecode all the time. DOS for instance, you had to find one. If you're given base64 content, you install a base64 encode/decode package and then its what you have.

There was also an extended period of time where people did uux much as they did shar: both of which are inviting somebody else's hands into your execution state and filestore.

We were also obsessed with efficiency. base64 was "sold" as denser encoding. I can't say if it was true overall, but just as we discussed lempel-zif and gzip tuning on usenet news, we discussed uuencode/base64 and other text wrapping.

Ned Freed, Nathaniel Borenstein, Patrik Falstrom and Robert Elz amongst others come to mind as people who worked on the baseXX encoding and discussed this on the lists at the time. Other alphabets were discussed.

uu* was the product of Mike Lesk a decade before, who was a lot quieter on the lists: He'd moved into different circles, was doing other things and not really that interested in the chatter around line encoding issues.

eesmith · 2 years ago

Here are Usenet comments from the the 1994 comp.mail.mime thread "Q: why base64 ,not UUencode?"

1) https://www.usenetarchives.com/view.php?id=comp.mail.mime&mi...

> Some of the characters used by uuencode cannot be represented in some of the mail systems used to carry rfc 822 (and therefore MIME) mail messages. Using uuencode in these environments causes corruption of encoded data. The working group that developed MIME felt that reliability of the encoding scheme was more important that compatibility with uuencode.

In a followup (same link):

> "The only character translation problem I have encountered is that the back-quote (`) does not make it through all mailers and becomes a space ( )."

A followup from that at https://www.usenetarchives.com/view.php?id=comp.mail.mime&mi... says:

> The back-quote problem is only one of many. Several of the characters used by uuencode are not present in (for example) the EBCDIC character set. So a message transmitted over BITNET could get mangled -- especially for traffic between two different countries where they use different versions of EBCDIC, and therefore different translate tables between EBCDIC and ASCII. There are other character sets used by 822-based mail systems that impose similar restrictions, but EBCDIC is the most obvious one.

> We didn't use uuencode because several members of our working group had experience with cases where uuencoded files were garbaged in transit. It works fine for some people, but not for "everybody" (or even "nearly everybody").

> The "no standards for uuencode" wasn't really a problem. If we had wanted to use uuencode, we would have documented the format in the MIME RFC.

That last comment was from Keith Moore, "the author and co-author of several IETF RFCs related to the MIME and SMTP protocols for electronic mail, among others" says https://en.wikipedia.org/wiki/Keith_Moore .

mjevans · 2 years ago

After a given point usenet was nearly 8-bit clean, and thus https://en.wikipedia.org/wiki/YEnc was also developed to convolve all the octets (I + 42 (decimal)) and escape the results that happened to still match reserved characters (CR, LF, 0x0, = (yEnc escape)) - it seems that if the result character was among that set, then = was output and new output determined by O = (I+64) % 256 instead.

wkat4242 · 2 years ago

Yenc is still used a lot actually, for the purpose of what Usenet has de facto become, a piracy network :)

u801e · 2 years ago

It's too bad yenc didn't take the place of base64 for email.

LukeShu · 2 years ago

> We were also obsessed with efficiency. base64 was "sold" as denser encoding. I can't say if it was true overall

uuencode has file headers/footers, like MIME. But the actual content encoding is basically base64 with a different alphabet; both add precisely 1/3 overhead (plus up to 2 padding bytes at the end).

cratermoon · 2 years ago

uuencode has some additional overhead, namely 2 additional bytes per line, that means it varies from 60-70%, the latter being best case, while base64 is 75% efficient in all cases.

On a related note, I'm getting flashbacks to being on the web in the late-1990s, back when "Downloads!" was a reason to visit a particular website; and noticing that Windows users like myself could just download-and-run an .exe file, while the same downloads for Mactintosh users would be a BinHex file that'd also be much larger than the Windows equivalent - and this wasn't over FTP or Telnet, but an in-browser HTTP download, just like today.

Can anyone explain why BinHex remained "popular" in online Mac communities through to the early 2000s? Why couldn't Macs download "real" binary files back then?

ksherlock · 2 years ago

Classic Macintosh files were basically 2 separate files with the same name (data fork and resource fork). Additionally, there was important meta data (Finder Info, most importantly the file type, creator type). Since other file systems couldn't handle forks or finder info, it had to be encapsulated in some other format like binhex, macbinary, applesingle, or stuffit. The other 3 were binary so they would have been smaller. Why not them... shrug

ksherlock · 2 years ago

I wasn't a macintosh user back in the day but for the file archives I frequented (apple II), sometimes files were in BINSCII which was a similar text encoding. The advantage being that they could be emailed inline, posted to usenet, didn't require an 8-bit connection (important back in the 80s), and could be transferred by screen scraping if there wasn't a better alternative.

bradknowles · 2 years ago

What I saw most of the time was a file that was compressed with StuffIt, and then encoded with BinHex. They were usually about inverse in terms of efficiency, so what you saved with StuffIt you would then turn around and lose with BinHex. But the resulting file was roughly the same size as the original file set.

LeoPanthera · 2 years ago

The most common Macintosh archive format was (eventually) StuffIt, but StuffIt Expander couldn't open a .sit file which was missing its resource fork, and when you downloaded a file from the internet, it only came with a data fork.

So a common hack was to binhex the .sit file. Binhex was originally designed to make files 7-bit clean, but had the side effect that it bundled the resource fork and the data fork together.

Later versions of StuffIt could open .sit files which lacked the resource fork just fine, but by then .zip was starting to become more common.

amatecha · 2 years ago

I could be remembering wrong, but didn't later versions of stuffit compress to a .sit file that had no resource fork, so it would stay fully-intact on any filesystem? I may be imagining that, but I remember hitting a certain version where "copying to Windows" would no longer ruin my .sit files... haha

tornato7 · 2 years ago

Funny because today I find the install process for Mac much simpler. Most installs are "drag this .app file to your Applications folder", meanwhile on Windows you download an installer that downloads another installer that does who-knows-what to your system and leaves ambiguously-named files and registry modifications all over the place.

vbezhenar · 2 years ago

There are plenty of portable windows applications (distributed as a zipped directory) and there are plenty of pkg macOS installers.

I don't really understand why macOS users like this "simple" installation, because when you "uninstall" the app, it leaves all the trash in your system without a chance to clean up. And implying that macOS application somehow will not do "who-knows-what" to your system is just wrong. Docker Desktop is "simple", yet the first thing it does after launch is installing "who-knows-what".

steve1977 · 2 years ago

If the installer on Windows is properly done, you actually know exactly what it does to your system (including registry modifications). This includes the ability to remove the application completely.

Whereas on macOS, installation is trivial, but then the application sets up stuff upon first run and that is really intransparent then, with no way of properly uninstalling the app unless there is a dedicated uninstaller.

Clamchop · 2 years ago

There are plenty of inscrutable installers for macOS software. DRM-riddled bullshit and enterprise crapware are a disease.

But yeah, the simple case is quite nice.

demondemidi · 2 years ago

The one annoying thing macOS apps do is pollute /Library. Even apps that don’t explicitly write to this area end up with dozens of permafiles. Tons of stuff is spewed in there when you install an application that actually uses it. It’s like a directory version of a registry kitchen sink.

Deleted Comment

The padding character is not essential for decoding, since the number of missing bytes can be inferred from the length of the encoded text. In some implementations, the padding character is mandatory, while for others it is not used. An exception in which padding characters are required is when multiple Base64 encoded files have been concatenated.

ekidd · 2 years ago

One reason that uuencode lost out to Base64 was that uuencode used spaces in its encoding. It was fairly common for Internet protocols in those days to mess with whitespace, so it was often necessary to patch up corrupted uuencode files by hand.

Base64, on the other hand, was carefully designed to survive everything from whitespace corruption to being passed through non-ASCII character sets. And then it became widely used as part of MIME.

jjeaff · 2 years ago

and yet, Internet protocols (http, at least) don't play well with equal signs which are part of base64, sometimes. That little issue has caused lots of intermittent bugs for me over the years, either from forgetting to urlencode it or not urldecoding it at the right time.

eastbound · 2 years ago

So there are 7 base64 encodings, one with “+ / =“, one with “- _ =“, one with “+,” and no “=“… https://en.wikipedia.org/wiki/Base64#Variants_summary_table

OskarS · 2 years ago

And slashes as well, which is a magic character in both urls and file systems. Means you can't reliably use normal base64 for filenames, for instance. That might seem like a niche use-case, but it's really not, because you can use it for content-based addressing. Git does this, names all the blobs in the .git folder after their hash, but you can't encode the hash with regular base64.

JoshTriplett · 2 years ago

Ditto the obnoxious "quoted-printable" mail encoding, which turns every = into =3D.

Still more robust than uuencode though.

onetimeuse92304 · 2 years ago

I am using base62 for data that can be included in URIs.

afiori · 2 years ago

all three symbols are some of the worst possible choices for compatibility with urls and many other things

.-_ would have been a better choice tha +/=

Vt71fcAqt7 · 2 years ago

And now we can have whitespace in url queries but we are still using %20 everywhere because "that's standard"...

candiddevmike · 2 years ago

...only for base64 to become fragmented into standard or URL due to character choice, and padded or not padded as a "cool trick to save some bytes".

ggm · 2 years ago

DaiPlusPlus · 2 years ago

Aardwolf · 2 years ago

A thing I wonder: why is using = padding required in the most common base64 variant?

It's redundant since this info can be fully inferred from the length of the stream.

Even for concatenations it is not necessary to require it, since you must still know the length of each sub stream (and = does not always appear so is not a separator).

There's no way that using the = instead of per-byte length-checking gains any speed, since to prevent reading out of bounds you must check the per byte length anyway, you can't trust input to be a multiple of 4 length.

It could only make sense if it's somehow required to read 4 bytes at once, and you can't possibly read less, but what platform is such?

4gotunameagain · 2 years ago

from Wikipedia:

IMO padding is not necessary and just a relic of old implementations.

I think so too. It feels similar to how many specifications from the 90s use big endian 4-byte integers for many things (like png, riff, jpeg, ...) despite little endian CPU's being most common since the 80s already, and those specifications seemingly assuming that you would want to decode those 4-byte values with fread without any bounds checking or endianness dependency.

teo_zero · 2 years ago

Without padding, how would you encode, for example, a message with just a single zero? To be more precise, how do you distinguish it from two zeroes and three zeroes?

Both for encoding and decoding the padding is not needed. Without ='s, you get a uniquely different base64 encoding for NULL, 2 NULLs and 3 NULLs.

This shows the binary, base64 without padding and base64 with padding:

NULL --> AA --> AA==

NULL NULL --> AAA --> AAA=

NULL NULL NULL --> AAAA --> AAAA

As you can see, all the padding does is make the base64 length a multiple of 4. You already get uniquely distinguishable symbols for the 3 cases (one, two or three NULL symbols) without the ='s, so they are unnecessary

The output padding is only relevant for decoding. For encoding, since the alphabet of Base64 is 6 bits wide, the padding is 0 when the input is not a multiple of 6 (e.g. encoding two bytes (16 bits) needs two more bits to become a multiple of 6 (18))

Refer to the "examples" section of the wikipedia page

IshKebab · 2 years ago

Perhaps to simplify implementations that read multiple characters at a time?

But I think it's likely just poor design taste.

remram · 2 years ago

> Even for concatenations it is not necessary to require it, since you must still know the length of each sub stream

I'm not sure I understand this part. You can decode aGVsbG8=IHdvcmxk, what do you need to know?

The = does not appear if the base64 data is a multiple of 4 length. So you wouldn't know if aGVsbG8I is one or two streams. The = is not a separator, only padding to make the base64 stream a multiple of 4 length for some reason.

I only mentioned the concatenation because Wikipedia claims this use case requires padding while in reality it doesn't.

smudgy · 2 years ago

You know, when I first got into binary encoding into text I asked myself this very question but never put any effort into looking it up.

Now, 25+ years later, I have some answers - thanks!

Dwedit · 2 years ago

There is still one sort-of efficient way of embedding binary content in an HTML file. You must save the file as UTF-16. A Javscript string from a UTF-16 HTML file can contain anything except these: \0, \r, \n, \\, ", and unmatched surrogate pairs 0xD800-0xDFFF.

If you escape any disallowed character in the usual way for a string ("\0", "\r", "\n", "\\", "\"", "\uD800") then there is no decoding process, all the data in the string will be correct.

If you throw data that is compressed in there, you're unlikely to get very many zeroes, so you can just hope that there aren't too many unmatched surrogate pairs in your binary data, because those get inflated to 6 times their size.

Note that this operates on 16-bit values. In order to see a null, \r, \n, \\ and ", the most significant byte must also be zero, and in order for your data to contain a surrogate pair, you're looking at the two bytes taken together. When the data is compressed, the patterns are less likely.

38 · 2 years ago

https://wikipedia.org/wiki/Binary-to-text_encoding

whoopdedo · 2 years ago

Not listed was a clever encoding for MS-DOS files, XXBUG[1]. DOS had a rudimentary debugger and memory editor. (It even stuck around all the way to Windows XP but didn't survive the transition to 64-bit.) Because it had the ability to write to disk you could convert any file to hexadecimal bytes and sprinkle some control commands about to create a script for DEBUG.EXE. The text-encoded file could then be sent anywhere without needing to download a decoder program first.

[1] http://justsolve.archiveteam.org/wiki/XXBUG

bnjf · 2 years ago

but uuencode makes for fun: https://github.com/bnjf/compre.sh