The specs behind the specs – a deep-dive on ASN.1

- they necessarily result in unnecessarily redundant encodings -- this is wasteful, bloat - that redundancy is of zero help to a compiler - that redundancy is a psychological crutch to any programmer writing hand-coded codecs, but this often has led to serious bugs - tag allocation has to be managed, and here again you really want a compiler to do it for you -- ASN.1 eventually added AUTOMATIC tags, but the damage of not having had those was done

- streaming encoding is infeasible -- you have to know the definite lengths before you start encoding, so you lose - you either have to compute the length of the encoding of any value before you begin encoding it, or you have to encode "back to front" (and then possibly realloc as needed) or both

Can the veterans of the 90s SSL Wars explain the issues with ASN1/DER/BER? Looking it up today, it seems like a pretty smart and extensive serialization system, and I have to wonder why new systems like Google Protobufs chose to reinvent the wheel.

Conversely, how have modern systems avoided the pitfalls (if any) of ASN1/DER/BER?

breser · 4 years ago

I know of at least one problem with ASN.1. The string encodings other than UTF-8 are terrible. Most of the string encodings are very limited and weird subsets of ASCII that nobody actually uses anymore. ASN.1 itself doesn't define the encodings and just refers to other standards.

The problem with this is probably most notable with the T.61 encoding which changed over the years and since ASN.1 references other standards nobody is quite sure exactly what you have to support to have T.61 actually work right.

Within X.509 certificates though nobody bothers to actually implement T.61 and just uses the T.61 flag for ISO-8859-1.

There are a bunch of gory details around this mess in this (now quite old) write-up here: https://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt

Since that write up I believe UTF-8 is pretty much the expectation for character encoding for X.509.

I documented some of the quirks around 6 years ago when I took an existing X.509 parser and improved it for use in certificate trust management in Subversion: http://svn.apache.org/viewvc/subversion/trunk/subversion/lib...

Basically ASN.1 wasn't well defined and it only works well when people agreed to only use certain features or to interpret things in a particular way when ambiguous.

It's also notoriously difficult to parse well. It's very easy to have bugs in your parser, even if you're implementing a subset of it that's needed for X.509. Especially if you're doing so in a non-memory safe language.

I can't speak for why Google invented Protobufs, but I can't imagine anyone sane picking up ASN.1 for anything modern and deciding that this is what they want to use.

tialaramex · 4 years ago

For the string encoding thing, however, it does have UTF-8 and you should not use anything else to express arbitrary human text anyway.

PKIX actually leverages the weird encoding restriction to our benefit. It defines two kinds of names which things might have on the Internet (you can and should stop trying to name things which are actually on the Internet some other way), DnsNames and IpAddresses. IpAddresses, since they're either 32-bit or 128-bit arbitrary bit values, are just represented as either 32-bit or 128-bit arbitrary bit values. So you cannot express the erroneous IPv4 address 100.200.300.400 as an IpAddress, which means you can't trip up somebody's parser with that nonsense address. DnsNames use a deliberately sub-ASCII encoding from ASN.1 which can express all the legal DNS names (all A-labels and the ASCII dot . are permissible) but can't express lots of other goofy things including most Unicode. So a certificate issuer, even if they're completely incompetent, cannot write a valid DnsName that expresses some garbage IDN as Unicode. Hopefully they read the documentation and find out they need to use A-labels (Punycode) but if not they're prevented from emitting some ambiguous gibberish.

Even in forums where you'd once have expected pushback, "Just use UTF-8" is becoming more widespread. Microsoft for example, once upon a time you'd get at least some token resistance, today they're likely to agree "Just use UTF-8". So ASN.1 ends up no worse off for a half a dozen bad ways to write text you shouldn't use, compared to say XML, HTML, and so on.

pzb · 4 years ago

A couple of years ago I ran into the same confusion of the "TeletexString"/"T61String" data type in ASN.1. After going down the rabbit hole of what is T.61 and trying to map it to Unicode, I reread the ASN.1 (X.690) spec and realized that the authors never actually referenced T.61. Ever since the first edition of ASN.1 in 1988, those strings have not used T.61. They use a character set that is easily mapped to Unicode - https://www.itscj-ipsj.jp/ir/102.pdf, a subset of US ASCII.

Not to say the rest of the spec is notably better. If fully implemented, it requires supporting escape codes in strings to change character sets. I've never seen valid escape codes in real world data, but it probably exists.

As the original article shows, ASN.1 has lots of other challenges and complexity. Trying to write a code generator that supports all the complexity is no trivial task and the only open source one I've seen only generates C code. Protobuf has the advantage of having modern language support (including multiple type safe and memory safe languages).

cryptonector · 4 years ago

> Basically ASN.1 wasn't well defined and it only works well when people agreed to only use certain features or to interpret things in a particular way when ambiguous.

ASN.1 has always been as-well- or better-defined than its competition. The ITU-T specs for it are a thing of beauty not often equaled outside the ITU-T.

That said, for a long time the ASN.1 specs were non-free, and that hurt a lot. Also, the BER family of encoding rules stunted development of open source tooling for ASN.1.

AceJohnny2 · 4 years ago

> I can't imagine anyone sane picking up ASN.1 for anything modern and deciding that this is what they want to use.

Part of my curiosity stems from Apple using it as part of their bootable file-format: https://www.theiphonewiki.com/wiki/IMG4_File_Format

But as you say, I have to assume they're using it in a very constrained way.

cryptonector · 4 years ago

> The string encodings other than UTF-8 are terrible.

Well, yes, because ASN.1 predates Unicode.

wbl · 4 years ago

Oh where to begin?

ASN.1 really demands code generation. Unfortunately lots of nonconforming stuff has to be dealt with. The concept of encoding rules and the module tagging scheme make for a pretty big number of possible representations.

The language semantics of ASN.1 don't really map to anything well, particularly around default fields and structures that can vary.

Newer systems don't have encoding rules and pick a semantics that matches a target language much more closely.

cryptonector · 4 years ago

> ASN.1 really demands code generation.

Nope, nyet, bzzt. Proofs by counter-example:

- OpenLDAP has a printf/scanf-like approach to BER encoding

- Heimdal has an ASN.1 compiler that generates code, yes, but also alternatively generates bytecode that gets interpreted at run-time.

> The language semantics of ASN.1 don't really map to anything well, particularly around default fields and structures that can vary.

You are ill-informed. Proof by counter-example:

- there are ASN.1 encoding rules that produce natural XML (XER) and JSON (JER)

- "default fields" are supported (the relevant keyword is `DEFAULT`, naturally)

- "structures that can vary" -- if you mean unions, it's got that (the relevant keyword is `CHOICE`), and if you mean "extensions", it's got extensibility markers (that effectively are alike a CHOICE of an octet string of unknown stuff, or else the extensions known at module compile time.

AceJohnny2 · 4 years ago

> ASN.1 really demands code generation.

On this specific point: isn't this also the case for other high-performance serialisers? Google ProtoBufs, Apache Thrift, any protocol through Rust's SerDes...

wbl · 4 years ago

Also expect to pay to read the spec.

cryptonector · 4 years ago

There is NO problem with ASN.1 itself except a bit of ugliness. There are SERIOUS problems with DER/BER/CER and with all tag-length-value schemes -- this includes protobufs!

ASN.1 is just syntax and semantics. There are encoding rules that produce textual representations (GSER), XML (XER), JSON (JER), there's XDR-style encoding rules (PER and OER, but with 1-octet units instead of 4-octet units, plus efficient representation of optional fields).

In fact, you can make ASN.1 encoding rules that are based on NDR and XDR and which work for all of IDL and XDR and that subset of ASN.1 that is covered by the semantics of IDL and XDR, and you can extend that to cover all of ASN.1 if you want.

I should know these things, as I maintain an ASN.1 compiler and I intend to eventually teach it to do XDR and NDR.

Really, there's nothing about data schemas that you can express in JSON, CBOR, IDL, XDR, S-expressions, or any schema language you want, that you can't express in ASN.1, or, if there is, it's got to be a pretty niche feature and easily added to ASN.1 anyways. Even functions (RPCs) can be expressed in ASN.1 with some conventions, and routinely are, because it's really just a request/response protocol.

But every year someone invents a new thing because of how stupid, tired, and old ASN.1 is (or, rather, they perceive it to be). Or because of how complex ASN.1 is and how there's a paucity of tools, so then they: reinvent the wheel (often badly), a wheel for which instantly there is a paucity of tools.

carapace · 4 years ago

Personally, I think that people just like to reinvent things. I don't want to sound shitty (or have kentonv show up again to scold me for it) but I get the feeling that, a lot of the time, it's just that simple.

https://news.ycombinator.com/item?id=20725550

juanbyrge · 4 years ago

To me that is a specious argument. It's like asking why Python was invented when Cobol could suffice.

The dozens of ASN.1 specs are absolutely hideous and entrenched in obsolete telecom jargon. If the sole goal Protobuf was to avoid having Google engineers be required to refer to the dozens of ASN.1 specs when disagreements or confusions arose, then it would have been 100% worth it for just that reason.

bri3d · 4 years ago

ASN.1 was too broad. There is immense value in a more constrained specification that does not include so many hazardous serialization types and antiquated string formats.

Now, should Protobufs or Thrift simply have been constrained versions of ASN.1? I think there is a view of software engineering where this would have been an ideal outcome, but almost universally when we see too-big standards, they are declared "dangerous" and avoided like the plague before they are downscoped.

cryptonector · 4 years ago

ASN.1 in 1984 was not too broad. It was too simple, and it was too targeted to tag-length-value encoding rules (which are stupid -- TLV is a crutch that is only maybe useful when you lack a compiler, which early on was the case).

ASN.1 today is as broad as it needed to evolve to be because its users needed it.

abbot2 · 4 years ago

ASN.1 is extremely complicated and hard to implement correctly. All ASN.1 implementations I've seen are either specialized (know how to work only with a very specific message), or slow, buggy and expose equally complicated APIs. Modern systems like protobufs tend to use much simpler encodings & specs which are easier to understand and implement correctly.

rixed · 4 years ago

Have spent a few years during the late 90s/early 2000s in an industry running on ASN.1, coming from the web. I was initially surprised by how enamoured most of my coworkers were with ASN.1 and its tools, but it grew on me too: the pleasure of interacting only with a protocol specifications regardless of the implementation language/intricacies of the remote party, the guaranty that there could be no invalid messages received or emitted, the automatic generation of tests and tools, eventually balanced out the inconvenience of not being able to readily read data on the wire (it was before every human-readable protocols gets encrypted) and the inconvenience of not being able to start coding upfront.

It was like going from runtime type checking to static type checking: initially inconvenient, but paying dividends after a short while.

So why did this tech disappeared if it was ultimately better than the later alternatives (textual protocols, shema-less serializers, and eventually protobuf which reinstated some form of efficient encoding and type checking).

As it uncannily frequently occurs with technological evolution, the reason is probably not to be found within its technical issues (which basically all boil down to: designed by committee).

ASN.1 was just a bit too inconvenient, the free tools to generate code were just not quite good and robust enough, and the approach of starting with designing your types and protocols and putting in place your code production tool-chain before being able to ship anything was at odd with the mood of the day, which was to let the junior cheap dev fire off his code editor during the coffee break of the first design planning meeting to build the first half-backed prototype that would be already sold to the customer by the time he hits :wq. To move fast and break things, ASN.1 got in the way.

So did formal specifications in general, code analyzing tools, even basic type checking, all of them thrown out the window during the same period for the extra weight, extra time-to-market and extra cost of hiring. Text protocols out competing saner alternatives because they are initially simpler (SIP vs H.323 anyone?), schema-less data formats predominating almost entirely because you can start hacking quicker, etc. are all attributable to that cultural rather than technical trend I believe.

Now it seems the industry is slowly recovering from these excesses. Maybe because of the damage that has done, but more likely because of the end of cheap hardware progresses, encryption everywhere and massive data volumes (that's what made Google come up with better protocols than HTTP and better formats than human readable text, after all).

eismcc · 4 years ago

I owned the Microsoft ASN1 library for a while around 2005. It was a maintenance nightmare and I spent a lot of time fixing static analysis derived issues.

That said, I always found the standard quite interesting with different encodings based on the degree of prior shared info or format. My assumption is that not-invented-here is part of the why it’s not used.

cryptonector · 4 years ago

I own Heimdal's ASN.1 compiler. It's a pleasure.

stevep98 · 4 years ago

I used the Netscape/Mozilla NSS library quite a bit, and one problem I found with it, is that all of the DER encoding/decoding was written by hand. They should have generated all that boilerplate from the ASN.1 modules written in the specs (later, RFC 2459, but at the time, a hodge-podge of scattered specs).

Hand-coding works okay when the data is what you expect. But when you throw mal-formed certificates at it, you have to catch all the edge cases. Having generated code would have enabled much more edge cases to be covered.

jmspring · 4 years ago

Those libraries were originally written in the early/mid 90s. Don’t recall much in the way of code generation tools that would take those specs and generate the code at the time.

Spent a bunch of time working with and adding to those libraries.

kevin_thibedeau · 4 years ago

10 different string encodings is one problem.

noselasd · 4 years ago

Is it ? You pick the one that fits your use, normally UTF8String these days

AceJohnny2 · 4 years ago

Can one use UTF8?

The 90s were rough on text encoding, but it seems pretty settled now.

cryptonector · 4 years ago

Stick to UTF8String. ASN.1 predates Unicode.

jupp0r · 4 years ago

No veteran of the 90s SSL wars, but I once upon the time was tasked with fixing security bugs in a custom protocol backend server which used ASN.1 for purposes that one would probably use protobuf nowadays.

The quality of existing open source libraries to parse ASN.1 leaves a lot to be desired.

jwalton · 4 years ago

When I first saw protobufs, I wondered exactly the same thing.

There’s an “XER” if you want a human-readable XML encoding, too.

> You might have heard of similar such abstract syntax notations used for interface definitions such as Google Protocol Buffers, or Facebook’s Apache Thrift, but those languages have not been managed by a standardization organization, so the owning corporations could (in theory) make breaking changes or change the license or even remove the language definitions overnight.

Is this really the main difference between ASN.1 and Google protobufs, that one is managed by a private corporation and the other by a standardization organization? Can they otherwise be used "interchangably" in designing interfaces, a la two different programming languages (with different syntax of course)?

bri3d · 4 years ago

ASN.1 struggles because the word "ASN.1" can name a lot of different implementations with different nuances, and a "complete" ASN.1 implementation is a massive and hazardous undertaking which has left many with a sour taste. Meanwhile, ProtoBufs and Thrift work off of more constrained and well-versioned interfaces.

Honestly, ASN.1 with semantic versioning at the protocol level would probably have been as robust and useful as Protobufs. If ASN.1 had been forked into "ASN.1 3.0 without 10 hazardous and awful 1980s text encodings," it could even be fairly palatable today. Whether the overly expansive nature of ASN.1 is a product of the committee / standards organization design or the timeframe in which it originated is certainly an interesting philosophical question.

jnwatson · 4 years ago

I used a subset of ASN.1 for a project, and it worked quite well.

ASN.1 versioning in particular is a work of art.

cryptonector · 4 years ago

> Meanwhile, ProtoBufs and Thrift work off of more constrained and well-versioned interfaces.

Not so. Protocol buffers is just a TLV encoding, which is bad (see elsewhere in this thread) -- it's just a cut-down ASN.1 and variation on BER, so what.

ASN.1 can "well-version" everything just as well as anything else.

jmspring · 4 years ago

ASN.1 is a specification. DER/BER were methods to encode that specification.

jwalton · 4 years ago

In terms of tooling, there’s excellent tooling for ASN.1 for C and C++ and maybe some other languages. There’s excellent tooling for protobufs for a handful of languages too, but they’re different sets, so in practice what languages you want to use would likely come into play.

breser · 4 years ago

How excellent the ASN.1 tooling is depends on which subset of ASN.1 you're using. Some of the tooling supports one iteration of ASN.1 or the other. To the degree that the IETF had to write a document on how to deal with this since some of the standards use the older ASN.1 and some use the newer ASN.1: https://tools.ietf.org/id/draft-ietf-pkix-asn1-translation-0...

Interoperability with ASN.1 is very fragile at best.

memling · 4 years ago

> In terms of tooling, there’s excellent tooling for ASN.1 for C and C++ and maybe some other languages. There’s excellent tooling for protobufs for a handful of languages too, but they’re different sets, so in practice what languages you want to use would likely come into play.

In my experience, tooling is actually very good for most commonly-used languages, including C/C++, C#, Java, Python, and maybe even Go. And, of course, erlang. The real challenge is, I think, that you cannot find good free tooling, and the barrier to entry for Joe Developer is fairly high (in the thousands of dollars).

memling · 4 years ago

> Is this really the main difference between ASN.1 and Google protobufs, that one is managed by a private corporation and the other by a standardization organization? Can they otherwise be used "interchangably" in designing interfaces, a la two different programming languages (with different syntax of course)?

No, the two are not interoperable and probably won't be made that way. Protobuf has undergone changes that challenge its backwards-compatibility (e.g., with item presence). ASN.1 supports multiple encoding rules, and while it's possible that someone could map ASN.1 syntax to protobuf encodings, it would only support a subset of ASN.1 because protobuf doesn't support length or value constraints (among other ASN.1 features).

ASN.1 does have a little-used standard called Encoding Control Notation[0] that in principle supports the construction of novel encodings. But I have never seen a compiler, commercial or otherwise, that supports it. It requires a certain expressiveness in your parser that's hard to do right, although I've wondered if LISP or Racket could take it on.

[0]: https://www.itu.int/rec/T-REC-X.692-202102-I

cryptonector · 4 years ago

Protocol buffers is a tag-length-value encoding. It's got all the problems that DER and CER have. It's what happens when people decide to reinvent a wheel they don't understand.

feffe · 4 years ago

What are the issues with TLV? I guess one could be that it's difficult to modify messages. On the other hand skipping parts of a message is efficient.

oblio · 4 years ago

You should write a blogpost at this point.

You can write more about these problems and it would have higher visibility.