The Yaml document from hell

> is that for a long time it was the only viable configuration format

Actually, this is not the case. We had INI format for simple stuff and XML (protected with entire schema) for complex things many years ago, which worked. Yet, we wanted something readable (like INI), but able to express complex types (XML).

I don't think Toml is a viable replacement - for me, it has an INI-level of simplicity with even worst hacks to have nested structures. But, give it time and you'll have another YAML.

But yes, YAML is confusing for edge cases (true/false, quoting), but I'm going to find a powerful replacement that is not XML. Maybe EDN can jump in, but for anything more complex, I'd rather have a Lisp interpreter and structures around than any of mentioned and upcoming formats.

Thinking that JSON is a suitable replacement, imagine writing Ansible or k8s stuff in it; that would be fun ;)

CipherThrowaway · 3 years ago

> Thinking that JSON is a suitable replacement, imagine writing Ansible or k8s stuff in it; that would be fun ;)

Writing in YAML doesn't feel much better. YMMV but I've been on teams using Pulumi for k8s and the developer experience has been significantly better. I can automate, type check, lint, click through to definitions the same way I do with other typescript.

Pulumi is a young product with many rough edges but it's already been a game changer for me.

PaulHoule · 3 years ago

JSON has no comments. That disqualifies it immediately.

cocochanel · 3 years ago

I checked the sample Kubernetes project in Java [0] and it looks hideous. YAML looks much better in comparison.

[0] https://www.pulumi.com/docs/get-started/kubernetes/review-pr...

ilyt · 3 years ago

It's very simple actually.

... just fucking don't, generate config in your configuration management tool of choice and then serialize it to YAML. You get all of the advantages (nice to read)and none of the disadvantages (need editor or it is PITA)

brnewd · 3 years ago

I would love to swap Pulumi for Ansible, but it's cloud-only for the most part.

urbandw311er · 3 years ago

> YAML is confusing for edge cases (true/false, quoting)

I'm not sure I would consider booleans an 'edge case'

esrauch · 3 years ago

Booleans aren't an edge case, on is an edge case way to spell false.

singingfish · 3 years ago

I'm nose deep in an Oracle to Postgres conversion at the moment and from that experience, among others, I can absolutely assure you that although booleans are definitely simple they are also most definitely a very sharp edge case.

ilyt · 3 years ago

The "edge case" here is "I'm using dynamically typed language and some parsers implemented alternative wording for true/false"

jimmaswell · 3 years ago

XML did everything and it's perfectly readable if it's formatted and structured well. This zoo of different markup/object/whatever languages we have to deal with now is largely a mistake.

gnulinux · 3 years ago

This is confusing because XML is the archetype of what I would consider unreadable. If I got a prompt in a programming language design workshop to intentionally design something unreadable for humans, I would start thinking about XML and see where it leads me. Can you name any language any more unreadable than XML to help me?

wyuenho · 3 years ago

XML is a mistake. 80% of every XML document is just redundant noise because the IBM lawyer who invented the SGML syntax had never heard of S-expression.

bogwog · 3 years ago

XML is too much of a pain to write by hand.

berniedurfee · 3 years ago

If JSON and YAML were first on the scene; XML would be invented as an answer.

Hard to mess up. Robust schema language. Very flexible. Easy to process.

Yeah, it’s verbose, but that’s the trade-off between usability and robustness.

Now get off my lawn!

lanstin · 3 years ago

What XML replaced was a combination of custom designed (often binary) formats and HTTP query-string like syntaxes. It is quite verbose compared to either of them, and explodes both your bandwidth and serialize/deserialize times, but it is arguably easier for your new co-worker to guess that the address line 2 goes in <address2>...</address2> rather than prefix tagged with 13, or worse yet at offset 42-61.

I got used to XML, tho I never could quite understand XSLT and the desire to program in it. I got used to json, but yaml I just can't bring myself to parse. YAML is 90% stuff you can't guess and just 10% data. And why so many?

Normal_gaussian · 3 years ago

> Thinking that JSON is a suitable replacement, imagine writing Ansible or k8s stuff in it; that would be fun ;)

k8s via helm is often templated via go template strings; which works by creating an unreadable and unhighightable mess, introducing lots of its own bugs.

wyuenho · 3 years ago

INI has no spec, and there are many variations all slightly incompatible with each other, kinda like Markdown. YAML is really the only sane configuration language that lets you denote nested structures while keeping them look nested.

Basically, all of the problem identified in the article can be dealt with 1 rule - always quote your strings. I agree with the author we should have reduced, safe and minimalist subset of yaml, which is basically YAML 1.2, released in 2009.

P.S. Please just stop using PyYAML.

mike_hearn · 3 years ago

There's HOCON which is pretty good if you can run on a JVM. It's a superset of JSON designed for readability and human-friendliness when writing config files. It doesn't change the type system and doesn't have yamls weird edge cases, but is still a lot easier to write than JSON. There's also a relatively tight spec.

https://github.com/lightbend/config/blob/main/HOCON.md

We use an extended version of it for our app and the resulting config is pretty clean. You can see an example here:

https://hydraulic.software/blog/8-packaging-electron-apps.ht...

The only downside is that the reference implementation is hardly maintained anymore.

talideon · 3 years ago

I second that. And if people need to deal with YAML in Python, they should be using ruamel.yaml, which is a far superior library on just about every level: https://pypi.org/project/ruamel.yaml/

jwilk · 3 years ago

> Please just stop using PyYAML.

Why?

GauntletWizard · 3 years ago

Jsonnet and Cue are the places to be looking.

chazu · 3 years ago

Cue is one of the most exciting developments that's impacted me professionally in recent years. I can't advocate for it enough.

throwawaymaths · 3 years ago

No. XML is broken for structured data that isn't HTML because it's not really clear (there aren't even "best practices" afaict) what should be a text node, what should be an attribute node, and what should be a subtag node.

int_19h · 3 years ago

It's not like other formats don't face similar questions. E.g. if you have a list of key-value pairs to serialize to JSON, do you translate those keys to JSON properties in an object, or do you translate each pair as an object in an array?

eru · 3 years ago

Use S-Expressions?

tgbugs · 3 years ago

My distaste for yaml led me to attempt just that. The first piece that was missing for me was a python parser that could produce reasonable error messages and could be transformed into the desired internal representation in python. So I wrote one [0]. It was supposed to be a single file that was less than 100 lines and could be copied and pasted into any project that I needed. Turns out that the issue was a bit more complex.

The issue is that there is sufficient complexity in finding a portable representation for configuration formats that it just kicks the can down the road. On the other hand it means that as soon as you decide what format you are going to support you can quickly implement it. There is more or less a intersectional grammar that works across most if not all lisps, and that is the plist `(:k v :k2 (:k3 v2))`. So I settled on that for my own use.

After all that work I have not dealt with the fact that numbers and chars do not have a portable representation across lisp dialects, which is a key complaint in other threads here. Limited support for let binding constants also seems like a feature that would allow for just enough expressivity to make the format useful without opening up the terror that is `&` and `*` in yaml (cool and useful as it may be).

In summary s-expressions are: 1. missing good parsers in a number of language ecosystems 2. not standard across lisp dialects 3. need additional semantics for binding, multiple expressions, etc. 4. still better than yaml and json

0. https://github.com/tgbugs/sxpyr

jerf · 3 years ago

I know what s-expressions are, vaguely. Vaguely in terms of "I couldn't write a grammar for them off the top of my head.", that is, not "what are they".

Is there a single agreed-upon defined grammar that everyone can use? Preferably one simple enough that like JSON's it is at least capable of being used as a graphic on the home page for the format? https://www.json.org/json-en.html

This is an honest question, because there may well be and I don't know it.

However, I will put this marker down in advance: If multiple people jump up to say "oh, yes, of course, it's right here", and their answers are not 100% compatible with each other, then the answer is no.

The other marker I'll put down is "just use common lisp", I want verification that it really is 100% standardized, no question what any construct means, ever, and I still bet we get people who would rather see Scheme or Clojure, and I bet there's some sort of difference.

Neither of these objections is fatal to the idea. JSON is technically not just "javascript objects", so if someone carved out a defined format from s-expressions, then held it up as a standard, that would be as valid as what Crockford did. But at least as of right now, I'm not aware of anyone having done that standardization work. Replies welcome.

usrnm · 3 years ago

With each tool using its own macro-based DSL for managing them. That would be fun

mirekrusin · 3 years ago

JSON is good because it doesn't let you do stupid things like writing programming languages in it.

revskill · 3 years ago

This is exactly what i'm doing with this YAML to SQL converter: https://gist.github.com/revskill10/57ecd8efb72f361b93e6d9d9f...

xmcqdpt2 · 3 years ago

Why???

anonymoushn · 3 years ago

Is INI a format in the sense that it has a formal grammar and a bunch of compatible parsers and serializers, or is it more of a convention?

use Data::Dumper; use YAML; my $a ="--- geoblock_regions: - dk - fi - is - no - se "; print Dumper(Load($a)); $VAR1 = { 'geoblock_regions' => [ 'dk', 'fi', 'is', 'no', 'se' ] };

I never could understand the hate XML got, but I'm having a bit of schadenfreude seeing what people suffer through with its replacements.

An XML document with a well-thought-out domain-specific DTD would solve all these problems; instead, we have something where no sometimes means false (but not always) and 22:22 sometimes means 1342 (but not always!)... because... why not?!?

All that horrible mess, it seems, because people didn't like to have to close tags.

donatj · 3 years ago

It’s the learning curve really.

Our industry has the remarkable properties of being almost entirely newbies, a constant churn of green developers, combined with being very bad about passing down generational knowledge. This is why things that are easier to explain win out over things that are technically superior but take longer to get your head around almost every time.

The thing JSON had going for it over XML is that it maps cleanly to most languages object models so you can read the results directly. No writing XQueries or DOMs to read values.

It’s only major downfall is lack of comments, which has lead people to YAML. (There’s plenty of other things it lacks in comparison to XML, like native schemas, but most of that falls into things newbies don’t know they want)

YAML is in this weird middle place where it’s easy to explain but impossible for a human to master. It appears as simple as JSON to newcomers who adopt it, but the long time users of it find it full of foot guns. People wanted JSON with comments but instead they got the complexity of XML minus the clarity.

makeitdouble · 3 years ago

XML is/was hated by newbies and old farts alike.

Of course some of the hate come from the application where XML was used, more than XML itself, but it also is a deeply flawed language.

Nowadays the main issue would be that it requires a complex generator and parser libraries to be any useful (you'll never want to deal with XML parsing/escaping by yourself) yet it's not as efficient as binary formats like protobuff for instance.

That means that anything you'll want to edit by hand or be purely textual and readble will be better done in yaml or json, and anything beyond that can be done in other ways. The need for a single language trying to awkwardly span all the spectrum isn't big.

red_admiral · 3 years ago

XML is ok when both the reader and writer are machines, but you still want a text-based format (otherwise I'd just use protobuf) or when maybe you occasinally want to edit by hand but not often.

I think it's the same reason why, although HTML is a great standard, lots of us like writing in Markdown and having something convert that to HTML, despite all the problems when you try and push Markdown further than it was intended.

YAML, as I see it, is trying to be to XML configurations what Markdown is to HTML, with the added bonus of an attempt at a tag/reference system to store object graphs that are not trees. Back in the day of XML-based Spring config files, we had <bean> and <ref bean=...> but as far as I know that's implemented on the Spring layer, it's not a generic property of XML, whereas YAML tries to abstract that into the format itself.

bambax · 3 years ago

> YAML, as I see it, is trying to be to XML configurations what Markdown is to HTML

Yes, that's probably a good analogy. The big difference though is that if some Markdown fails to parse, nothing really bad happens, while a YAML file that fails to parse can bring a whole system down.

Also, Markdown is a famously ambiguous format; it trades precision for ease of write, and that's fine, mostly.

But in a configuration file, ambiguity is really the opposite of what you want.

thrwawayjan1223 · 3 years ago

I worked with hand editing XML files in the past, and I didn't really have an issue with simple XML files. I actually prefer XML in some cases.

I see your point if you need to edit a complex XML with multiple namespaces mixed together, but a plain XML file can be just as readable as JSON.

Some JSON files can be really hard to edit by hand too. At my current workplace I often have to deal with nested JSON files, where a JSON contains values that are also JSON, but encoded and escaped so that it is difficult to edit.

Jenk · 3 years ago

Hard disagree.

XML is great for systems to read and write, but utterly abhorrent for humans. It's not just having to close tags. It's content vs attribute confusion, namespacing noise (which are also muddled with attributes), and there's a squint factor incurred by the density of information.

bambax · 3 years ago

Namespacing is the opposite of noise: it lets different domains coexist with no possible confusion about which is which.

But experience has taught me that trying to explain why XML is great and simple to read and write for humans, to someone who thinks differently, is useless. And so, I won't.

Yet, please accept that some people, like me, really liked it and didn't mind the little quirks in light of all it offered.

Ygg2 · 3 years ago

On the other hand XML document with good schema is pleasure to write in any decent IDE.

It just autocompletes itself. I know JSON schema exist, but it never managed to just work that well.

vincnetas · 3 years ago

i think that propper schema and autocompleate would save the day. even i would say make things easier. talking about XML

singingfish · 3 years ago

XML has the hate because back in the day everyone was like "oh XML is a human and computer readable data format, that will solve all of our problems".

Sure it solved all of the problems by making a data format that both humans and computers found difficult to read.

"Oh I've got a problem. I'll solve it with XML. Now you have two problems" ;)

acdha · 3 years ago

I remember being excited when XML hit 1.0 (as a new programmer this seemed like a huge advance over things like classic Unix configuration files), and progressively disappointed over the next decade as the promise not only wasn’t delivered on.

The things which killed XML seem to me to be related to the old standards culture: the people involved assumed adoption was inevitable and distracted themselves with increasingly arcane thickets of new standards, with the assumption that someone else would spend time on the “boring” work of building professional-quality tools and documentation or cleaning up usability warts. That other 80% of the work never happened and most people who had a choice moved on.

As a thought experience, imagine if libxml2 had had even a single dedicated developer focused on tracking standards or making usability improvements, instead of training multiple generations of users that XML was slow and hostile to users. Various XML committees’ travel expenses building standards which were never used likely cost more than that. Not leaving XPath frozen around the turn of the century would have helped in so many places.

The other wart I think would have made a surprising difference is the usability disaster around namespaces. So many tool developers forced users to switch between the short namespace:attribute form they used everywhere in the document and the {namespace url}attribute form that resolves to, or forced you to respecify the namespaces on every operation rather than reusing the values the parser had already loaded. Users begrudged that verbosity but they hated it when it meant something silently returned incorrect results because a selector using the document’s own syntax didn’t find the element they could see using those exact values. Absolutely nothing anyone did in the XML world was a better use of time than fixing that would have been since it trained people to think of XML-based tools as a painful, error-prone experience to be avoided — and they did as soon as they could.

melx · 3 years ago

I miss the old days of working with XSLT and XPath. A nice way of giving some design touches and filtering/sorting to XML documents. It wasn't perfect but I would take any time over the yaml/json/toml/ini.

thraxil · 3 years ago

I don't know if I'd say I "miss" XSLT and XPath, but yeah, it was fun and powerful at the time. I made some crazy stuff with Cocoon and AxKit that, if you didn't mind the syntax, were actually pretty elegant.

rwmj · 3 years ago

XPath is cool (and I still use it for work regularly). jq JSON expressions seem to be inspired by it as well.

XSLT was horrible though. A programming language with XML syntax, no thanks!

bambax · 3 years ago

Me too!

MilStdJunkie · 3 years ago

Eh, definitely there's some generational round robin going on. And the "amsterdam problem" in YML is just insanity. But you got to admit, XML has some very low level intrinsic problems

* Schemas violate underlying XML rules all the time, and due to variance in XML parsers, it gets a pass but only on specific configurations. There's no one Holy XML Parser. Leading whitespace in attributes? Sure, why not?! Whitespace sensitive element order? Ooh yeah, lots of it! But try and feed it to libxml and the whole thing collapses - without error handling, mind you, but that's not XML's fault.

* Whitespace agnostic means diffs and merges are element aware, and it's nigh-impossible to estimate the upper compute limit of an element aware diff

* There is NO OFFICIAL WHITESPACE SPEC - so if you try and fix it with normalization and lines, it's going to be different everywhere you go. So you're forced to switch on element aware diff / merge in your VCS, which is a pretty big change, and it's one you have to sell other departments on.

* XML breaking 1NF, which, aka means that XML can have XML inside of XML infinite recursion when playing data format = which breaks so very very many things . . it's actually going against the whole concept of a data hierarchy, which is built in to XML at a low level.

* Sort of riding on that, in order to parse XML you have to eat it element by element, and there's no way to tell when an element is going to end or of it recurses N levels. This makes it computationally expensive. The CAD STEP format has some of this disease - it has to be loaded in its entirety to parse, which can be holy hell with a TB file.

* Yeah sure, namespace hell. No one really figured out a way to fix it. S1000D (XML spec) to this day just denies that XML namespaces ever existed, and I can't really fault their WG for doing that. I can fault them for so many other things, but not that.

silvestrov · 3 years ago

The really great thing about xml with XSD is the ability to validate the document really well.

I.e. having content matching regexps, so you can be sure version numbers are always \d+(\.\d+)* and does not have unexpected letters or spaces in it. That date values are in ISO format.

Very useful for APIs to 3rd parties as it now is very easy to catch most errors and provide an useful error message for them without having to code every check.

YeGoblynQueenne · 3 years ago

>> All that horrible mess, it seems, because people didn't like to have to close tags.

If people keep burning their faces off it's not the fault of the flamethrower! If they keep hacking each other's limbs off it's not the fault of the chainsaw! If they keep blowing their houses up it's not the fault of the gas bottles! It's not like we need to accommodate every possible failure mode of the human brain, and make things safe just because the next idiot that comes along is going to cause a catastrophe. It's the idiot's fault that they forget to apply the breaks and take advantage of the failsafes, it's not the responsibility of the system to have the breaks and failsafes where any idiot can find them!

I've head that kind of excuse 0 times, outside of software circles. But of course I'm exaggerating because nobody really suffers because of XML other than the people who use it everyday. Like this once junior dev, for example, who was made to create XML by hand by eyballing an Excel document, just because paying a junior dev salary is easier than getting middle management (and clients) to learn to structure an Excel spreadsheet properly.

Sometimes, just sometimes, you have to design a tool with the way it will actually be used in the real world in mind. And all the rest of the time? Well, all the rest of the time you have to do it that way, too.

scott_w · 3 years ago

> All that horrible mess, it seems, because people didn't like to have to close tags.

The issue isn't that it's bad per-se. The issue is that it's cumbersome to write. When you want to setup and tweak a tool, it gets annoying very quickly having to deal with little errors because you misspelled a closing tag. Maybe you want to test enabling a setting? Typing out 4 characters is so much less friction than the 20 or more (including the < > : / characters) you'd need for an XML config.

wirrbel · 3 years ago

I mean it wasn't just that.

* `<?xml ...` something line lost a percentage of people already, etc.

* Essentially XML was a markup language (SGML stripped down), while you can use markup to mark up key-value pairs, it just wasn't designed or optimized for that.

* To the contrary, it also did away with the nice SGML features for that instead of `<tag>value</tag>` one could write `<tag/value/` and the self-closing tags (like `<ol><li> ... <li></ol>` in HTML. For most particularly deep config files self closing tags would have been perfect.

* XML attributes vs substructure was just an awkward choice to give out. I.e. `<myConfig userId="asdf" host="asdfasdf" ...>` vs `<myConfig><userId>asdf</userId><host>asdfasdf</host></myConfig>`.

More so I get the feeling when looking at the XSL, XSLT, etc. mania that a similar pattern was at work as with UML: special interest groups and tooling providers driving the evolution of a standard with their interests in mind and not the interest of users or developers.

Overall XML could have been something like this `<myConfig / <userId/asdf/ <host/asdfasdf/` if it was SGML.

YAML and JSON succeeded because they had a clean and predictable, no-nonsense mapping between encoding and object-model after decoding. Probably we should all switch to an almost-yaml format that does away with the peculiarities, and the FANG companies would have the momentum to make that happen.

I personally would like for HJSON (https://hjson.github.io) to see more adoption, but that train has passed...

acdha · 3 years ago

It’s more typing but it’s simple typing most people don’t even think about because it’s predictable (not to mention automatic in many editors). I’d take that over the frictional cost of thinking your YAML is done and then having to debug magic data conversion or realize that you left out one character causing something to be parsed completely differently.

Where XML falls down hard is tool usability. There’s still no standard formatter or good validation tool, and things like namespace support is a constant source of friction.

makeitdouble · 3 years ago

> An XML document with a well-thought-out domain-specific DTD would solve all these problems;

But then a YAML document written by disciplined and well-thinking people will also not have these problems.

Any of the complex formats work well when used in the most fitting way and go horribly wrong when people try to benefit too much from the complexity. XML was also a nightmare when put in the wrong hand, just as yaml is, and the next hyped and turing complete document syntax will be.

enw · 3 years ago

If something's too easy to misuse, it's bad design.

talideon · 3 years ago

The problem with XML is mainly in the 'M': it's a _markup_ language. Using it for configuration and arbitrary data serialisation isn't where its strengths lie, but it got shoved into those niches because if you look at it _just right_, you can make it work in them.

I don't hate XML myself, but I do hate how it's been abused over the years.

xienze · 3 years ago

> I never could understand the hate XML got

It's because so many people would see examples of complex _XML-based formats (WSDL, SOAP, etc.)_ and infer that _XML itself_ is complicated. XML is really not that difficult to understand, and I find it quite amusing that the same people who don't bat an eye at writing HTML complain about how baffling XML is.

Is writing an XML parser difficult? Yes, very much so. But again, that doesn't make XML itself complicated. And before anyone tries to call out this particular comment, keep in mind that writing a fast, correct, and safe JSON parser is no walk in the park either.

Are there many examples of complicated XML-based formats? Yes, but that's just a reflection of the complexity of some particular configuration model, not XML itself.

PurpleRamen · 3 years ago

XML is cool and useful, but for humans you need good tooling to handle it well. And it still is very noisy with the all it's boilerplate and what advanced features can bring in. And overall it has a culture of making things complex, and complicated.

ppseafield · 3 years ago

Just like bad/fragmented YAML libraries now, there were plenty of bad XML libraries and implementations. I didn't think XML was too bad until I had to write a SOAP request where the endpoint would throw an error if the arguments (in their own tags, mind you) weren't in a specific order. The endpoint gave me no hints.

Additionally we had to update WSDLs for these services, but the service generating the WSDL used features our server's SOAP library didn't support, so someone had to manually transform the XML and via lengthy trial and error to get it to work.

mmis1000 · 3 years ago

These days, JSON also has JSON schema that allows you to enforce the structure of a JSON file.

If you need a enum that only allow specific value, you just list it.

philjohn · 3 years ago

True - but then again, writing an XML config file can be a bit of a pain as it's more verbose.

singularity2001 · 3 years ago

XML is preferable for long or heavily nested data structures.

For small flat trees the alternatives are preferable

api · 3 years ago

It’s the verbosity for me. Why does everything need a fully spelled out closing tag?

FpUser · 3 years ago

I see nothing wrong about closing tag. Helps to navigate and understand in my opinion. I have way more problem with developers trying to be super concise and writing constructs that are very hard to parse mentally when looking at.

singularity2001 · 3 years ago

So that in long nested data you know which tag closes what. {{{{{}}}}}