> is that for a long time it was the only viable configuration format
Actually, this is not the case. We had INI format for simple stuff and XML (protected with entire schema) for complex things many years ago, which worked. Yet, we wanted something readable (like INI), but able to express complex types (XML).
I don't think Toml is a viable replacement - for me, it has an INI-level of simplicity with even worst hacks to have nested structures. But, give it time and you'll have another YAML.
But yes, YAML is confusing for edge cases (true/false, quoting), but I'm going to find a powerful replacement that is not XML. Maybe EDN can jump in, but for anything more complex, I'd rather have a Lisp interpreter and structures around than any of mentioned and upcoming formats.
Thinking that JSON is a suitable replacement, imagine writing Ansible or k8s stuff in it; that would be fun ;)
> Thinking that JSON is a suitable replacement, imagine writing Ansible or k8s stuff in it; that would be fun ;)
Writing in YAML doesn't feel much better. YMMV but I've been on teams using Pulumi for k8s and the developer experience has been significantly better. I can automate, type check, lint, click through to definitions the same way I do with other typescript.
Pulumi is a young product with many rough edges but it's already been a game changer for me.
... just fucking don't, generate config in your configuration management tool of choice and then serialize it to YAML. You get all of the advantages (nice to read)and none of the disadvantages (need editor or it is PITA)
I'm nose deep in an Oracle to Postgres conversion at the moment and from that experience, among others, I can absolutely assure you that although booleans are definitely simple they are also most definitely a very sharp edge case.
XML did everything and it's perfectly readable if it's formatted and structured well. This zoo of different markup/object/whatever languages we have to deal with now is largely a mistake.
This is confusing because XML is the archetype of what I would consider unreadable. If I got a prompt in a programming language design workshop to intentionally design something unreadable for humans, I would start thinking about XML and see where it leads me. Can you name any language any more unreadable than XML to help me?
XML is a mistake. 80% of every XML document is just redundant noise because the IBM lawyer who invented the SGML syntax had never heard of S-expression.
What XML replaced was a combination of custom designed (often binary) formats and HTTP query-string like syntaxes. It is quite verbose compared to either of them, and explodes both your bandwidth and serialize/deserialize times, but it is arguably easier for your new co-worker to guess that the address line 2 goes in <address2>...</address2> rather than prefix tagged with 13, or worse yet at offset 42-61.
I got used to XML, tho I never could quite understand XSLT and the desire to program in it. I got used to json, but yaml I just can't bring myself to parse. YAML is 90% stuff you can't guess and just 10% data. And why so many?
> Thinking that JSON is a suitable replacement, imagine writing Ansible or k8s stuff in it; that would be fun ;)
k8s via helm is often templated via go template strings; which works by creating an unreadable and unhighightable mess, introducing lots of its own bugs.
INI has no spec, and there are many variations all slightly incompatible with each other, kinda like Markdown. YAML is really the only sane configuration language that lets you denote nested structures while keeping them look nested.
Basically, all of the problem identified in the article can be dealt with 1 rule - always quote your strings. I agree with the author we should have reduced, safe and minimalist subset of yaml, which is basically YAML 1.2, released in 2009.
There's HOCON which is pretty good if you can run on a JVM. It's a superset of JSON designed for readability and human-friendliness when writing config files. It doesn't change the type system and doesn't have yamls weird edge cases, but is still a lot easier to write than JSON. There's also a relatively tight spec.
I second that. And if people need to deal with YAML in Python, they should be using ruamel.yaml, which is a far superior library on just about every level: https://pypi.org/project/ruamel.yaml/
No. XML is broken for structured data that isn't HTML because it's not really clear (there aren't even "best practices" afaict) what should be a text node, what should be an attribute node, and what should be a subtag node.
It's not like other formats don't face similar questions. E.g. if you have a list of key-value pairs to serialize to JSON, do you translate those keys to JSON properties in an object, or do you translate each pair as an object in an array?
My distaste for yaml led me to attempt just that. The first piece that was missing for me was a python parser that could produce reasonable error messages and could be transformed into the desired internal representation in python. So I wrote one [0]. It was supposed to be a single file that was less than 100 lines and could be copied and pasted into any project that I needed. Turns out that the issue was a bit more complex.
The issue is that there is sufficient complexity in finding a portable representation for configuration formats that it just kicks the can down the road. On the other hand it means that as soon as you decide what format you are going to support you can quickly implement it. There is more or less a intersectional grammar that works across most if not all lisps, and that is the plist `(:k v :k2 (:k3 v2))`. So I settled on that for my own use.
After all that work I have not dealt with the fact that numbers and chars do not have a portable representation across lisp dialects, which is a key complaint in other threads here. Limited support for let binding constants also seems like a feature that would
allow for just enough expressivity to make the format useful without opening up the terror that is `&` and `*` in yaml (cool and useful as it may be).
In summary s-expressions are:
1. missing good parsers in a number of language ecosystems
2. not standard across lisp dialects
3. need additional semantics for binding, multiple expressions, etc.
4. still better than yaml and json
I know what s-expressions are, vaguely. Vaguely in terms of "I couldn't write a grammar for them off the top of my head.", that is, not "what are they".
Is there a single agreed-upon defined grammar that everyone can use? Preferably one simple enough that like JSON's it is at least capable of being used as a graphic on the home page for the format? https://www.json.org/json-en.html
This is an honest question, because there may well be and I don't know it.
However, I will put this marker down in advance: If multiple people jump up to say "oh, yes, of course, it's right here", and their answers are not 100% compatible with each other, then the answer is no.
The other marker I'll put down is "just use common lisp", I want verification that it really is 100% standardized, no question what any construct means, ever, and I still bet we get people who would rather see Scheme or Clojure, and I bet there's some sort of difference.
Neither of these objections is fatal to the idea. JSON is technically not just "javascript objects", so if someone carved out a defined format from s-expressions, then held it up as a standard, that would be as valid as what Crockford did. But at least as of right now, I'm not aware of anyone having done that standardization work. Replies welcome.
I've never met anyone who say's "I like YAML, it is great"... most people that worked with it say something like "YAML is annoying, I don't like it"...
While introducing Kubernetes at our company in the last two years, we are currently in a process going more and more away from YAML with internal Helm charts to a much simpler process by just using HCL and Terraform, and defining Kuberentes resources as Terraform resources.
As a software developer HCL just makes so much more sense than this YAML + Helm + Go templates hell, which feels like C preprocessor hell all over again. Other solutions like kustomize are neat, but I don't see how all of these YAML workarounds should be better than something like HCL with Terraform. HCL feels like a real declarative programming language (with real conditions, variables, a module system and useful built-in functions). YAML feels like another more complex JSON and other tools like Helm or Kustomize try to work around the weaknesses of YAML with some kind of templating system.
YAML looks nice to read in simple demos and in small files, but is just not adequate in the real world (in my personal opinion - I know that YAML is used by a lot of people in production as of today).
> I've never met anyone who say's "I like YAML, it is great"
Maybe I'm older than you, but I have definitely heard that line.
Mostly because the alternatives were XML, INI or the myriad of bespoke formats, relayd/apachehttpd .conf or iptables etc;etc;
INI has parsers that operate in different ways and doesn't support heirarchies... so that's not ideal.
JSON and YAML came to the fore around the same time, and JSONs limitations in comments and it's picky semantics meant that people did prefer YAML over JSON for human readable configs.
YAML itself is fine, it has some really awkward warts and the parsers are usually programatically unsafe in their implementation (leading to less compatible "safe_load" or other types of loaders)[0]; the issue we actually have with YAML is that we:
A) Template it (jinja, mustache whatever)
B) Put entirely too much stuff into it. (kubernetes manfiests can grow to the hundreds of lines really easily)
These problems will affect any configuration file format we choose to use, including TOML (which is comparatively new on the block), because reading templated/enormous files is really difficult.
What I've taken to doing is programatically generating objects and then serialising them as whatever my software depends on. It might feel ugly to use an entire turing complete language to generate objects that are mostly static: but honestly... the ability to breakpoint, test and print the subsections of output is astonishingly nice.
The tooling is super mature, it's easy to emit, it's easy to parse, it's easy to validate, it can just a little hard to read and write by hand (and I mostly blame SOAP for that). Still, basic XML isn't that hard to read or write, thanks to editor support.
I like that you can use anchors and merges. It greatly simplifies complex, repetive structures. And most of the complaints about yaml can be worked around by string-quoting.
The whitespace can get in the way if you're templating, but then you can also use [1, 2, 3] as a list notation, for example.
In fact, most of the complaints could be resolved by running it through a linter.
I like YAML. More specifically, the subset of YAML, like the author suggests. Clear, intuitive, and allows expressing of complex data structures like JSON does. Much better than TOML, which easily becomes a mess with more complex data.
Yup exactly my experience as well, again a stupid idea to try and make a "configuration" language out of nested key value pairs that end up needing fancy interpreters allowing more and more semantic into the keys and values to start doing what a simple program could have done in half the time...
I ve worked in 4 companies over a period of 10 years, each had exactly this problem, with yml, json, xml, properties file (you dont want to see business logic conditionals in a properties text file, where the keys shapes command an interpreter to behave dynamically...)
The only times I saw a team do it well was a php backend of all things where the lead said they d program all their variations in php rather than source it from configuration flat descriptors and it was amazing, clear, simple and powerful. They had to release the backend at each config change instead of releasing the config change only, but Im still unsure why exactly that's a problem: the configs are software too if we re honest with ourselves, shoe-horning them in a descriptor language isnt gonna make them flat.
I don't think YAML is great, but I still think it's the best format out there.
The only confusing problem I've run into was the sexagesimal number notation and even that was fairly obvious. Perhaps it's because I tend to overquote strings?
I mean sure, the on/off to boolean mappings are annoying, but they also become very obvious when you're parsing config because the type validation will fail. If `flush_cache` has an enum `on` but no key `True` then the type validator will instantly complain about both the missing key and the extra key in the dictionary.
Same with accidental numbers, any type check will show that the parsing failed.
I find JSON for config files to become unreadable quickly because of the non-obvious nesting and the lack of comments. You can pick a JSON extension but then you need to pick one that your tooling will support.
I think problem in this particular case is using YAML as DSL. Every other data format would be equally bad here. Replace YAML with TOML and you're still in same templating hell.
YAML is least worst for me, and I don't think I ever hit the problems article is showing because
* I use editor that will highlight stuff like anchors
* I often generate config from CM so it can't have those errors
* Loading into defined struct in statically typed language also makes them impossible.
YAML is nuts, and JSON is annoying (trailing comma limitations, lack of comment syntax no matter how annoying it is that the spec is correct about why there are no comments).
Both have their place though. YAML came out of perl, and both are some confluence between awesome and horriffic (although yaml wins the horrific crown for sure).
I've had a little bit to do with Ingy - the inventor of yaml, and I've worked closely with some of his collaborators. Ingy is nuts, mostly in a good way, but I wouldn't put him in charge of the architecture, I'd put him in charge of the abyss.
Though, in fairness, I think old Perl did that too. It's super convenient until it isn't.
Rachel also doesn't approve of JSON in high-reliability systems for other reasons: https://rachelbythebay.com/w/2019/07/21/reliability/ and point taken, if you're sending data from your service A to your service B and neither is a web browser, nor are they written in JS, then there's far better formats and you almost need a reason not to use protobuf.
There was (I think probably still is) a qemu bug with JSON. It accepted requests to read guest memory in JSON format, with the memory addresses encoded as JSON numbers.
When reading out guest kernel memory (addresses are at the top of 64 bit space) these would silently be rounded to the nearest whole double. It took me a very long time to understand what was going on.
Actually JSON doesn't specify what numbers are - it would be perfectly licit for a JSON parser to transparently use a real numerical tower, allowing perfect representations of any non-repeating decimal fractional number (since it has to be represented as a dotted-fraction and there's no support for a vinculum (aka U+0305 / COMBINING OVERLINE / ◌̅ / 3.21̅) there's no way to represent non-repeating fractions if there is not a non-repeating representation in base 10). A few JSON parsers even do this. That said, if you don't control the both sides sending something that won't be handled by the lowest-common-denominator (browser JSON parsers / JS numbers) is asking for trouble.
This is a great post but my understanding is this has nothing to do with JSON, which is unopinionated about numbers. Rather, with JS's JSON parser.
Python, for example, has several JSON libraries which let you swap out the numeric parser so it yields Decimal objects all the time. It's overkill for most use cases, but essential if you're working with REST APIs in Fintech.
JSON doesn't specify what numbers are. Integers that take 2MB to represent are valid JSON numbers.
Regarding protobuf, the following opinion is obviously insane, and if your org is already using protobuf you should ignore it: protobuf actually seems pretty bad? It has a bunch of vestigial features that people just say not to use. Its integer encoding bloats the encoded size and causes unnecessary dependency chains in the decoder. I would strongly prefer sending simdjson tape between processes and storing simdjson tape at rest, but if my coworkers insisted on doing something normal, maybe I would look into flatbuffers or capnproto.
YAML used from withing statically typed language gets rid of most of the problem, but the main one seems to be "well, we figured out which stuff was just a bad idea and put it in 1.2, except nobody uses it"
> Both have their place though. YAML came out of perl, and both are some confluence between awesome and horriffic (although yaml wins the horrific crown for sure).
Weirdly enough I'm not getting most of those issues in Perl YAML, "norway problem" for example
use Data::Dumper;
use YAML;
my $a ="---
geoblock_regions:
- dk
- fi
- is
- no
- se
";
print Dumper(Load($a));
$VAR1 = {
'geoblock_regions' => [
'dk',
'fi',
'is',
'no',
'se'
]
};
This reminds me of a certain architect at my last shop who invented a DSL on top of his Python superapp. He expected all projects to go through his superapp. The DSL was configured in YAML. The YAML was often so dense he recommended devs use Jinja to generate the YAML.
This meant debug was hell, plus it wasn't always clear if what you were trying to do was even supported / if not why & what needed to be changed. This was because you were now 3 levels of abstraction away from the Python code that was actually executing.
Every time a dev took on a new project they had to jump on a call with architect or right hand man to figure out if what they were trying to do was going to be possible.
It escalated into the architect demanding to know a sprint in advance any task devs were trying to do, in a review session, so he could explain if it was possible or not and try to triage in his DSL..
>The DSL was configured in YAML. The YAML was often so dense he recommended devs use Jinja to generate the YAML.
Did he then went on to design Ansible ? It falls into same trap
Only way you should be generating data format using language's templating system is
<%= YAML.dump(@config) %>
Also 9 times out of 10 I wished the app designer just used <app language> or <any common embeddable language> (like Lua) instead of making any kind of DSL (whether that's just data file pretending to be code or micro programming language)
This is basically how I feel about working with K8S and dredging through a repo full of templated YAML spaghetti. What am I looking at now? Helm, Keda, Flux, Argo, OperatorHub, GitHub Actions? oh actually this bit is in Terraform in another folder, whoops.
You can’t actually deploy something unless you can mentally untangle it all, it just sits in front of your infra as a sort of DevOps Coming of Age ritual, where you look whistfully over your shoulder at the old Heroku or Vercel account you grew up with. Simpler times.
At work someone is trying to introduce a system where a bunch of Jinja templates in a repository are used to generate XML which can then be used to generate another XML document which can then be "executed", resulting in an annotated XML document :)
I've read about places that do this kind of stuff. Although it sounds like pure hell, I'm sure there's always a reasonable explanation, an intent. What kinds of problems was the org facing that led to the development of this?
I guess YAML has a place in that it would prevent that kind of thing happening in the first place.
YAML is easy to debug (thanks to having comment syntax) because it just deserialises into code. Sometimes it deserialises into code that compiles on the fly mind you which is never a good idea.
On the other hand one time I debugged a really nasty memory leak by dumping many megabytes of YAML then running git diff against the dumps. That was fun. Of course the client used the quick and bad hack rather than the demonstrably correct fix (thanks to the dumps) because they were frightened of their own code.
That sounds like a layer of insanity that would make me consider jobs elsewhere. It sounds entirely unnecessary and burdensome , but was it unnecessary?
> lack of comment syntax no matter how annoying it is that the spec is correct about why there are no comments
This completely arbitrary ideological purity has come at the expense of countless wasted hours, headaches, and suboptimal workarounds like using strings as comments, with zero tangible benefit - zero bad things would have ever happened if JSON allowed comments. There is nothing correct about it.
Is this the same Ingy that made Test::Base? It's the best data-driven testing framework I've ever used, and I've missed it often while working with other languages. The follow-up polyglot framework just didn't cut it for me.
Do people dislike TOML only because it looks like a Windows INI file? I think it’s nice. Rust chose it in keeping with their penchant for sanity most of the time.
I would prefer if logically nested blocks could also be phyisically nested (and indented), so you can have a full tree structure. If you're describing something that can have variable levels of nesting (think folders) then it can sometimes make the format easier to understand.
I like YAML for reading and TOML is entirely worse for reading (still million times better than JSON tho), and as the use cases are mostly read, rarely written, and if written they are code-generated (using configuration management), YAML fits better.
I never could understand the hate XML got, but I'm having a bit of schadenfreude seeing what people suffer through with its replacements.
An XML document with a well-thought-out domain-specific DTD would solve all these problems; instead, we have something where no sometimes means false (but not always) and 22:22 sometimes means 1342 (but not always!)... because... why not?!?
All that horrible mess, it seems, because people didn't like to have to close tags.
Our industry has the remarkable properties of being almost entirely newbies, a constant churn of green developers, combined with being very bad about passing down generational knowledge. This is why things that are easier to explain win out over things that are technically superior but take longer to get your head around almost every time.
The thing JSON had going for it over XML is that it maps cleanly to most languages object models so you can read the results directly. No writing XQueries or DOMs to read values.
It’s only major downfall is lack of comments, which has lead people to YAML. (There’s plenty of other things it lacks in comparison to XML, like native schemas, but most of that falls into things newbies don’t know they want)
YAML is in this weird middle place where it’s easy to explain but impossible for a human to master. It appears as simple as JSON to newcomers who adopt it, but the long time users of it find it full of foot guns. People wanted JSON with comments but instead they got the complexity of XML minus the clarity.
Of course some of the hate come from the application where XML was used, more than XML itself, but it also is a deeply flawed language.
Nowadays the main issue would be that it requires a complex generator and parser libraries to be any useful (you'll never want to deal with XML parsing/escaping by yourself) yet it's not as efficient as binary formats like protobuff for instance.
That means that anything you'll want to edit by hand or be purely textual and readble will be better done in yaml or json, and anything beyond that can be done in other ways. The need for a single language trying to awkwardly span all the spectrum isn't big.
XML is ok when both the reader and writer are machines, but you still want a text-based format (otherwise I'd just use protobuf) or when maybe you occasinally want to edit by hand but not often.
I think it's the same reason why, although HTML is a great standard, lots of us like writing in Markdown and having something convert that to HTML, despite all the problems when you try and push Markdown further than it was intended.
YAML, as I see it, is trying to be to XML configurations what Markdown is to HTML, with the added bonus of an attempt at a tag/reference system to store object graphs that are not trees. Back in the day of XML-based Spring config files, we had <bean> and <ref bean=...> but as far as I know that's implemented on the Spring layer, it's not a generic property of XML, whereas YAML tries to abstract that into the format itself.
> YAML, as I see it, is trying to be to XML configurations what Markdown is to HTML
Yes, that's probably a good analogy. The big difference though is that if some Markdown fails to parse, nothing really bad happens, while a YAML file that fails to parse can bring a whole system down.
Also, Markdown is a famously ambiguous format; it trades precision for ease of write, and that's fine, mostly.
But in a configuration file, ambiguity is really the opposite of what you want.
I worked with hand editing XML files in the past, and I didn't really have an issue with simple XML files. I actually prefer XML in some cases.
I see your point if you need to edit a complex XML with multiple namespaces mixed together, but a plain XML file can be just as readable as JSON.
Some JSON files can be really hard to edit by hand too. At my current workplace I often have to deal with nested JSON files, where a JSON contains values that are also JSON, but encoded and escaped so that it is difficult to edit.
XML is great for systems to read and write, but utterly abhorrent for humans. It's not just having to close tags. It's content vs attribute confusion, namespacing noise (which are also muddled with attributes), and there's a squint factor incurred by the density of information.
Namespacing is the opposite of noise: it lets different domains coexist with no possible confusion about which is which.
But experience has taught me that trying to explain why XML is great and simple to read and write for humans, to someone who thinks differently, is useless. And so, I won't.
Yet, please accept that some people, like me, really liked it and didn't mind the little quirks in light of all it offered.
I remember being excited when XML hit 1.0 (as a new programmer this seemed like a huge advance over things like classic Unix configuration files), and progressively disappointed over the next decade as the promise not only wasn’t delivered on.
The things which killed XML seem to me to be related to the old standards culture: the people involved assumed adoption was inevitable and distracted themselves with increasingly arcane thickets of new standards, with the assumption that someone else would spend time on the “boring” work of building professional-quality tools and documentation or cleaning up usability warts. That other 80% of the work never happened and most people who had a choice moved on.
As a thought experience, imagine if libxml2 had had even a single dedicated developer focused on tracking standards or making usability improvements, instead of training multiple generations of users that XML was slow and hostile to users. Various XML committees’ travel expenses building standards which were never used likely cost more than that. Not leaving XPath frozen around the turn of the century would have helped in so many places.
The other wart I think would have made a surprising difference is the usability disaster around namespaces. So many tool developers forced users to switch between the short namespace:attribute form they used everywhere in the document and the {namespace url}attribute form that resolves to, or forced you to respecify the namespaces on every operation rather than reusing the values the parser had already loaded. Users begrudged that verbosity but they hated it when it meant something silently returned incorrect results because a selector using the document’s own syntax didn’t find the element they could see using those exact values. Absolutely nothing anyone did in the XML world was a better use of time than fixing that would have been since it trained people to think of XML-based tools as a painful, error-prone experience to be avoided — and they did as soon as they could.
I miss the old days of working with XSLT and XPath. A nice way of giving some design touches and filtering/sorting to XML documents. It wasn't perfect but I would take any time over the yaml/json/toml/ini.
I don't know if I'd say I "miss" XSLT and XPath, but yeah, it was fun and powerful at the time. I made some crazy stuff with Cocoon and AxKit that, if you didn't mind the syntax, were actually pretty elegant.
Eh, definitely there's some generational round robin going on. And the "amsterdam problem" in YML is just insanity. But you got to admit, XML has some very low level intrinsic problems
* Schemas violate underlying XML rules all the time, and due to variance in XML parsers, it gets a pass but only on specific configurations. There's no one Holy XML Parser. Leading whitespace in attributes? Sure, why not?! Whitespace sensitive element order? Ooh yeah, lots of it! But try and feed it to libxml and the whole thing collapses - without error handling, mind you, but that's not XML's fault.
* Whitespace agnostic means diffs and merges are element aware, and it's nigh-impossible to estimate the upper compute limit of an element aware diff
* There is NO OFFICIAL WHITESPACE SPEC - so if you try and fix it with normalization and lines, it's going to be different everywhere you go. So you're forced to switch on element aware diff / merge in your VCS, which is a pretty big change, and it's one you have to sell other departments on.
* XML breaking 1NF, which, aka means that XML can have XML inside of XML infinite recursion when playing data format = which breaks so very very many things . . it's actually going against the whole concept of a data hierarchy, which is built in to XML at a low level.
* Sort of riding on that, in order to parse XML you have to eat it element by element, and there's no way to tell when an element is going to end or of it recurses N levels. This makes it computationally expensive. The CAD STEP format has some of this disease - it has to be loaded in its entirety to parse, which can be holy hell with a TB file.
* Yeah sure, namespace hell. No one really figured out a way to fix it. S1000D (XML spec) to this day just denies that XML namespaces ever existed, and I can't really fault their WG for doing that. I can fault them for so many other things, but not that.
The really great thing about xml with XSD is the ability to validate the document really well.
I.e. having content matching regexps, so you can be sure version numbers are always \d+(\.\d+)* and does not have unexpected letters or spaces in it. That date values are in ISO format.
Very useful for APIs to 3rd parties as it now is very easy to catch most errors and provide an useful error message for them without having to code every check.
>> All that horrible mess, it seems, because people didn't like to have to close tags.
If people keep burning their faces off it's not the fault of the flamethrower! If they keep hacking each other's limbs off it's not the fault of the chainsaw! If they keep blowing their houses up it's not the fault of the gas bottles! It's not like we need to accommodate every possible failure mode of the human brain, and make things safe just because the next idiot that comes along is going to cause a catastrophe. It's the idiot's fault that they forget to apply the breaks and take advantage of the failsafes, it's not the responsibility of the system to have the breaks and failsafes where any idiot can find them!
I've head that kind of excuse 0 times, outside of software circles. But of course I'm exaggerating because nobody really suffers because of XML other than the people who use it everyday. Like this once junior dev, for example, who was made to create XML by hand by eyballing an Excel document, just because paying a junior dev salary is easier than getting middle management (and clients) to learn to structure an Excel spreadsheet properly.
Sometimes, just sometimes, you have to design a tool with the way it will actually be used in the real world in mind. And all the rest of the time? Well, all the rest of the time you have to do it that way, too.
> All that horrible mess, it seems, because people didn't like to have to close tags.
The issue isn't that it's bad per-se. The issue is that it's cumbersome to write. When you want to setup and tweak a tool, it gets annoying very quickly having to deal with little errors because you misspelled a closing tag. Maybe you want to test enabling a setting? Typing out 4 characters is so much less friction than the 20 or more (including the < > : / characters) you'd need for an XML config.
* `<?xml ...` something line lost a percentage of people already, etc.
* Essentially XML was a markup language (SGML stripped down), while you can use markup to mark up key-value pairs, it just wasn't designed or optimized for that.
* To the contrary, it also did away with the nice SGML features for that instead of `<tag>value</tag>` one could write `<tag/value/` and the self-closing tags (like `<ol><li> ... <li></ol>` in HTML. For most particularly deep config files self closing tags would have been perfect.
* XML attributes vs substructure was just an awkward choice to give out. I.e. `<myConfig userId="asdf" host="asdfasdf" ...>` vs `<myConfig><userId>asdf</userId><host>asdfasdf</host></myConfig>`.
More so I get the feeling when looking at the XSL, XSLT, etc. mania that a similar pattern was at work as with UML: special interest groups and tooling providers driving the evolution of a standard with their interests in mind and not the interest of users or developers.
Overall XML could have been something like this `<myConfig / <userId/asdf/ <host/asdfasdf/` if it was SGML.
YAML and JSON succeeded because they had a clean and predictable, no-nonsense mapping between encoding and object-model after decoding. Probably we should all switch to an almost-yaml format that does away with the peculiarities, and the FANG companies would have the momentum to make that happen.
I personally would like for HJSON (https://hjson.github.io) to see more adoption, but that train has passed...
It’s more typing but it’s simple typing most people don’t even think about because it’s predictable (not to mention automatic in many editors). I’d take that over the frictional cost of thinking your YAML is done and then having to debug magic data conversion or realize that you left out one character causing something to be parsed completely differently.
Where XML falls down hard is tool usability. There’s still no standard formatter or good validation tool, and things like namespace support is a constant source of friction.
> An XML document with a well-thought-out domain-specific DTD would solve all these problems;
But then a YAML document written by disciplined and well-thinking people will also not have these problems.
Any of the complex formats work well when used in the most fitting way and go horribly wrong when people try to benefit too much from the complexity. XML was also a nightmare when put in the wrong hand, just as yaml is, and the next hyped and turing complete document syntax will be.
The problem with XML is mainly in the 'M': it's a _markup_ language. Using it for configuration and arbitrary data serialisation isn't where its strengths lie, but it got shoved into those niches because if you look at it _just right_, you can make it work in them.
I don't hate XML myself, but I do hate how it's been abused over the years.
It's because so many people would see examples of complex _XML-based formats (WSDL, SOAP, etc.)_ and infer that _XML itself_ is complicated. XML is really not that difficult to understand, and I find it quite amusing that the same people who don't bat an eye at writing HTML complain about how baffling XML is.
Is writing an XML parser difficult? Yes, very much so. But again, that doesn't make XML itself complicated. And before anyone tries to call out this particular comment, keep in mind that writing a fast, correct, and safe JSON parser is no walk in the park either.
Are there many examples of complicated XML-based formats? Yes, but that's just a reflection of the complexity of some particular configuration model, not XML itself.
XML is cool and useful, but for humans you need good tooling to handle it well. And it still is very noisy with the all it's boilerplate and what advanced features can bring in. And overall it has a culture of making things complex, and complicated.
Just like bad/fragmented YAML libraries now, there were plenty of bad XML libraries and implementations. I didn't think XML was too bad until I had to write a SOAP request where the endpoint would throw an error if the arguments (in their own tags, mind you) weren't in a specific order. The endpoint gave me no hints.
Additionally we had to update WSDLs for these services, but the service generating the WSDL used features our server's SOAP library didn't support, so someone had to manually transform the XML and via lengthy trial and error to get it to work.
I see nothing wrong about closing tag. Helps to navigate and understand in my opinion. I have way more problem with developers trying to be super concise and writing constructs that are very hard to parse mentally when looking at.
I do think YAML is overly complex - but there is some hyperbole in this document.
- Many of these complaints are about YAML 1.1.
- YAML 1.2 was released _14 years ago_.
- The author makes some allusions to 1.2.2, and it requiring a team of people 14 years to make, but, from the yaml.com announcement they link to: “No normative changes from the 1.2.1 revision. YAML 1.2 has not been changed”
I guess my first two comments are undercut by PyYAML using YAML 1.1 (Really?! Python’s had 24 years of the Norway problem?!)
The article mentions the fact that YAML 1.2 is really old and the fact that it doesn't matter because YAML 1.1 is still the most commonly supported version, and the fact that it's arguably even worse because YAML 1.2 gives different parsing results to YAML 1.1!
I highly recommend reading the article - it's very good.
Agree with the final part of this article that "programmable" configuration languages like Nix and Dhall are the way forward.
I've spent a lot of time writing YAML for Ansible, Cloudformation, k8s, Helm, etc. Some of the issues this article mentions are pitfalls but once you get a bit of experience with it, you know what to look out for.
I've also spent and a lot of time writing Nix expressions, which is much more "joyful" IMO. Seemingly simple features like being able to create a function to reuse the same parameterized configuration makes life much easier.
Add in a layer of type safety and some integration with the 'parent' app (think replacements for CloudFormation's !GetAtt or Ansible's handlers), the ability to perform basic unit tests, then configuration becomes more like writing code which I consider a good thing.
I agree. They make a rod for their own back by having implicit conversion rules which, while well defined, are not well understood. "Explicit is better than implicit" and all that.
Great idea. I would also suggest adding some syntax to make lists and maps more obvious (it can be a bit unclear in YAML). Maybe {} for maps and [] for lists.
My wish for a dream config language is this: Allow a choice between unambiguously-typed expressions (quoted strings, only true/false, decimal numbers) and explicit type annotations. So this:
regions: $[string]
- no
- se
options: ${bool}
a: yes
b: no
(with possibly a different syntax) would be equivalent to
It’s really no wonder that it’s hard to create a language that’s supposed to know whether the author intended to write a string or not without it being indicated by the syntax. No other language in the world tries to do this, for exactly the reasons this article points out.
Since everyone seems to throwing their favorite format into the ring (), I will too: EDN [0]
* no enclosing element (i.e., can be streamed)
* maps, lists, even sets
* tags (like "Person". UUIDs and timestamps are built-in tags)
* floating point numbers
* integers
* comments
* UTF-8
* true booleans
* no need to worry about too many or too few commas in the right or wrong place
Implementations in almost every language under the sun [1].
The format is simple enough that it's easy to implement, verify, and test. No strange string interpretation craziness (see YAML and "Norway problem"), no ambiguity between FP and integers (see JSON), comments. And if your editor has rainbow parenthesis support, reading is actually a pleasant experience.
Actually, this is not the case. We had INI format for simple stuff and XML (protected with entire schema) for complex things many years ago, which worked. Yet, we wanted something readable (like INI), but able to express complex types (XML).
I don't think Toml is a viable replacement - for me, it has an INI-level of simplicity with even worst hacks to have nested structures. But, give it time and you'll have another YAML.
But yes, YAML is confusing for edge cases (true/false, quoting), but I'm going to find a powerful replacement that is not XML. Maybe EDN can jump in, but for anything more complex, I'd rather have a Lisp interpreter and structures around than any of mentioned and upcoming formats.
Thinking that JSON is a suitable replacement, imagine writing Ansible or k8s stuff in it; that would be fun ;)
Writing in YAML doesn't feel much better. YMMV but I've been on teams using Pulumi for k8s and the developer experience has been significantly better. I can automate, type check, lint, click through to definitions the same way I do with other typescript.
Pulumi is a young product with many rough edges but it's already been a game changer for me.
[0] https://www.pulumi.com/docs/get-started/kubernetes/review-pr...
... just fucking don't, generate config in your configuration management tool of choice and then serialize it to YAML. You get all of the advantages (nice to read)and none of the disadvantages (need editor or it is PITA)
I'm not sure I would consider booleans an 'edge case'
Hard to mess up. Robust schema language. Very flexible. Easy to process.
Yeah, it’s verbose, but that’s the trade-off between usability and robustness.
Now get off my lawn!
I got used to XML, tho I never could quite understand XSLT and the desire to program in it. I got used to json, but yaml I just can't bring myself to parse. YAML is 90% stuff you can't guess and just 10% data. And why so many?
k8s via helm is often templated via go template strings; which works by creating an unreadable and unhighightable mess, introducing lots of its own bugs.
Basically, all of the problem identified in the article can be dealt with 1 rule - always quote your strings. I agree with the author we should have reduced, safe and minimalist subset of yaml, which is basically YAML 1.2, released in 2009.
P.S. Please just stop using PyYAML.
https://github.com/lightbend/config/blob/main/HOCON.md
We use an extended version of it for our app and the resulting config is pretty clean. You can see an example here:
https://hydraulic.software/blog/8-packaging-electron-apps.ht...
The only downside is that the reference implementation is hardly maintained anymore.
Why?
The issue is that there is sufficient complexity in finding a portable representation for configuration formats that it just kicks the can down the road. On the other hand it means that as soon as you decide what format you are going to support you can quickly implement it. There is more or less a intersectional grammar that works across most if not all lisps, and that is the plist `(:k v :k2 (:k3 v2))`. So I settled on that for my own use.
After all that work I have not dealt with the fact that numbers and chars do not have a portable representation across lisp dialects, which is a key complaint in other threads here. Limited support for let binding constants also seems like a feature that would allow for just enough expressivity to make the format useful without opening up the terror that is `&` and `*` in yaml (cool and useful as it may be).
In summary s-expressions are: 1. missing good parsers in a number of language ecosystems 2. not standard across lisp dialects 3. need additional semantics for binding, multiple expressions, etc. 4. still better than yaml and json
0. https://github.com/tgbugs/sxpyr
Is there a single agreed-upon defined grammar that everyone can use? Preferably one simple enough that like JSON's it is at least capable of being used as a graphic on the home page for the format? https://www.json.org/json-en.html
This is an honest question, because there may well be and I don't know it.
However, I will put this marker down in advance: If multiple people jump up to say "oh, yes, of course, it's right here", and their answers are not 100% compatible with each other, then the answer is no.
The other marker I'll put down is "just use common lisp", I want verification that it really is 100% standardized, no question what any construct means, ever, and I still bet we get people who would rather see Scheme or Clojure, and I bet there's some sort of difference.
Neither of these objections is fatal to the idea. JSON is technically not just "javascript objects", so if someone carved out a defined format from s-expressions, then held it up as a standard, that would be as valid as what Crockford did. But at least as of right now, I'm not aware of anyone having done that standardization work. Replies welcome.
While introducing Kubernetes at our company in the last two years, we are currently in a process going more and more away from YAML with internal Helm charts to a much simpler process by just using HCL and Terraform, and defining Kuberentes resources as Terraform resources.
As a software developer HCL just makes so much more sense than this YAML + Helm + Go templates hell, which feels like C preprocessor hell all over again. Other solutions like kustomize are neat, but I don't see how all of these YAML workarounds should be better than something like HCL with Terraform. HCL feels like a real declarative programming language (with real conditions, variables, a module system and useful built-in functions). YAML feels like another more complex JSON and other tools like Helm or Kustomize try to work around the weaknesses of YAML with some kind of templating system.
YAML looks nice to read in simple demos and in small files, but is just not adequate in the real world (in my personal opinion - I know that YAML is used by a lot of people in production as of today).
Maybe I'm older than you, but I have definitely heard that line.
Mostly because the alternatives were XML, INI or the myriad of bespoke formats, relayd/apachehttpd .conf or iptables etc;etc;
INI has parsers that operate in different ways and doesn't support heirarchies... so that's not ideal.
JSON and YAML came to the fore around the same time, and JSONs limitations in comments and it's picky semantics meant that people did prefer YAML over JSON for human readable configs.
YAML itself is fine, it has some really awkward warts and the parsers are usually programatically unsafe in their implementation (leading to less compatible "safe_load" or other types of loaders)[0]; the issue we actually have with YAML is that we:
A) Template it (jinja, mustache whatever)
B) Put entirely too much stuff into it. (kubernetes manfiests can grow to the hundreds of lines really easily)
These problems will affect any configuration file format we choose to use, including TOML (which is comparatively new on the block), because reading templated/enormous files is really difficult.
What I've taken to doing is programatically generating objects and then serialising them as whatever my software depends on. It might feel ugly to use an entire turing complete language to generate objects that are mostly static: but honestly... the ability to breakpoint, test and print the subsections of output is astonishingly nice.
Then I don't care at all what the format is.
[0]: https://www.serendipidata.com/posts/safe-api-design-and-pyya...
The tooling is super mature, it's easy to emit, it's easy to parse, it's easy to validate, it can just a little hard to read and write by hand (and I mostly blame SOAP for that). Still, basic XML isn't that hard to read or write, thanks to editor support.
I like that you can use anchors and merges. It greatly simplifies complex, repetive structures. And most of the complaints about yaml can be worked around by string-quoting.
The whitespace can get in the way if you're templating, but then you can also use [1, 2, 3] as a list notation, for example.
In fact, most of the complaints could be resolved by running it through a linter.
I ve worked in 4 companies over a period of 10 years, each had exactly this problem, with yml, json, xml, properties file (you dont want to see business logic conditionals in a properties text file, where the keys shapes command an interpreter to behave dynamically...)
The only times I saw a team do it well was a php backend of all things where the lead said they d program all their variations in php rather than source it from configuration flat descriptors and it was amazing, clear, simple and powerful. They had to release the backend at each config change instead of releasing the config change only, but Im still unsure why exactly that's a problem: the configs are software too if we re honest with ourselves, shoe-horning them in a descriptor language isnt gonna make them flat.
The only confusing problem I've run into was the sexagesimal number notation and even that was fairly obvious. Perhaps it's because I tend to overquote strings?
I mean sure, the on/off to boolean mappings are annoying, but they also become very obvious when you're parsing config because the type validation will fail. If `flush_cache` has an enum `on` but no key `True` then the type validator will instantly complain about both the missing key and the extra key in the dictionary.
Same with accidental numbers, any type check will show that the parsing failed.
I find JSON for config files to become unreadable quickly because of the non-obvious nesting and the lack of comments. You can pick a JSON extension but then you need to pick one that your tooling will support.
What do you think of https://toml.io ?
YAML is least worst for me, and I don't think I ever hit the problems article is showing because
* I use editor that will highlight stuff like anchors
* I often generate config from CM so it can't have those errors
* Loading into defined struct in statically typed language also makes them impossible.
Both have their place though. YAML came out of perl, and both are some confluence between awesome and horriffic (although yaml wins the horrific crown for sure).
I've had a little bit to do with Ingy - the inventor of yaml, and I've worked closely with some of his collaborators. Ingy is nuts, mostly in a good way, but I wouldn't put him in charge of the architecture, I'd put him in charge of the abyss.
Though, in fairness, I think old Perl did that too. It's super convenient until it isn't.
Rachel also doesn't approve of JSON in high-reliability systems for other reasons: https://rachelbythebay.com/w/2019/07/21/reliability/ and point taken, if you're sending data from your service A to your service B and neither is a web browser, nor are they written in JS, then there's far better formats and you almost need a reason not to use protobuf.
When reading out guest kernel memory (addresses are at the top of 64 bit space) these would silently be rounded to the nearest whole double. It took me a very long time to understand what was going on.
Python, for example, has several JSON libraries which let you swap out the numeric parser so it yields Decimal objects all the time. It's overkill for most use cases, but essential if you're working with REST APIs in Fintech.
Regarding protobuf, the following opinion is obviously insane, and if your org is already using protobuf you should ignore it: protobuf actually seems pretty bad? It has a bunch of vestigial features that people just say not to use. Its integer encoding bloats the encoded size and causes unnecessary dependency chains in the decoder. I would strongly prefer sending simdjson tape between processes and storing simdjson tape at rest, but if my coworkers insisted on doing something normal, maybe I would look into flatbuffers or capnproto.
> Both have their place though. YAML came out of perl, and both are some confluence between awesome and horriffic (although yaml wins the horrific crown for sure).
Weirdly enough I'm not getting most of those issues in Perl YAML, "norway problem" for example
This meant debug was hell, plus it wasn't always clear if what you were trying to do was even supported / if not why & what needed to be changed. This was because you were now 3 levels of abstraction away from the Python code that was actually executing.
Every time a dev took on a new project they had to jump on a call with architect or right hand man to figure out if what they were trying to do was going to be possible.
It escalated into the architect demanding to know a sprint in advance any task devs were trying to do, in a review session, so he could explain if it was possible or not and try to triage in his DSL..
Did he then went on to design Ansible ? It falls into same trap
Only way you should be generating data format using language's templating system is
Also 9 times out of 10 I wished the app designer just used <app language> or <any common embeddable language> (like Lua) instead of making any kind of DSL (whether that's just data file pretending to be code or micro programming language)You can’t actually deploy something unless you can mentally untangle it all, it just sits in front of your infra as a sort of DevOps Coming of Age ritual, where you look whistfully over your shoulder at the old Heroku or Vercel account you grew up with. Simpler times.
YAML is easy to debug (thanks to having comment syntax) because it just deserialises into code. Sometimes it deserialises into code that compiles on the fly mind you which is never a good idea.
On the other hand one time I debugged a really nasty memory leak by dumping many megabytes of YAML then running git diff against the dumps. That was fun. Of course the client used the quick and bad hack rather than the demonstrably correct fix (thanks to the dumps) because they were frightened of their own code.
This completely arbitrary ideological purity has come at the expense of countless wasted hours, headaches, and suboptimal workarounds like using strings as comments, with zero tangible benefit - zero bad things would have ever happened if JSON allowed comments. There is nothing correct about it.
:-) I enjoyed this bit a bit too much, now I want it on my tombstone #lifegoals
An XML document with a well-thought-out domain-specific DTD would solve all these problems; instead, we have something where no sometimes means false (but not always) and 22:22 sometimes means 1342 (but not always!)... because... why not?!?
All that horrible mess, it seems, because people didn't like to have to close tags.
Our industry has the remarkable properties of being almost entirely newbies, a constant churn of green developers, combined with being very bad about passing down generational knowledge. This is why things that are easier to explain win out over things that are technically superior but take longer to get your head around almost every time.
The thing JSON had going for it over XML is that it maps cleanly to most languages object models so you can read the results directly. No writing XQueries or DOMs to read values.
It’s only major downfall is lack of comments, which has lead people to YAML. (There’s plenty of other things it lacks in comparison to XML, like native schemas, but most of that falls into things newbies don’t know they want)
YAML is in this weird middle place where it’s easy to explain but impossible for a human to master. It appears as simple as JSON to newcomers who adopt it, but the long time users of it find it full of foot guns. People wanted JSON with comments but instead they got the complexity of XML minus the clarity.
Of course some of the hate come from the application where XML was used, more than XML itself, but it also is a deeply flawed language.
Nowadays the main issue would be that it requires a complex generator and parser libraries to be any useful (you'll never want to deal with XML parsing/escaping by yourself) yet it's not as efficient as binary formats like protobuff for instance.
That means that anything you'll want to edit by hand or be purely textual and readble will be better done in yaml or json, and anything beyond that can be done in other ways. The need for a single language trying to awkwardly span all the spectrum isn't big.
I think it's the same reason why, although HTML is a great standard, lots of us like writing in Markdown and having something convert that to HTML, despite all the problems when you try and push Markdown further than it was intended.
YAML, as I see it, is trying to be to XML configurations what Markdown is to HTML, with the added bonus of an attempt at a tag/reference system to store object graphs that are not trees. Back in the day of XML-based Spring config files, we had <bean> and <ref bean=...> but as far as I know that's implemented on the Spring layer, it's not a generic property of XML, whereas YAML tries to abstract that into the format itself.
Yes, that's probably a good analogy. The big difference though is that if some Markdown fails to parse, nothing really bad happens, while a YAML file that fails to parse can bring a whole system down.
Also, Markdown is a famously ambiguous format; it trades precision for ease of write, and that's fine, mostly.
But in a configuration file, ambiguity is really the opposite of what you want.
I see your point if you need to edit a complex XML with multiple namespaces mixed together, but a plain XML file can be just as readable as JSON.
Some JSON files can be really hard to edit by hand too. At my current workplace I often have to deal with nested JSON files, where a JSON contains values that are also JSON, but encoded and escaped so that it is difficult to edit.
XML is great for systems to read and write, but utterly abhorrent for humans. It's not just having to close tags. It's content vs attribute confusion, namespacing noise (which are also muddled with attributes), and there's a squint factor incurred by the density of information.
But experience has taught me that trying to explain why XML is great and simple to read and write for humans, to someone who thinks differently, is useless. And so, I won't.
Yet, please accept that some people, like me, really liked it and didn't mind the little quirks in light of all it offered.
It just autocompletes itself. I know JSON schema exist, but it never managed to just work that well.
Sure it solved all of the problems by making a data format that both humans and computers found difficult to read.
"Oh I've got a problem. I'll solve it with XML. Now you have two problems" ;)
The things which killed XML seem to me to be related to the old standards culture: the people involved assumed adoption was inevitable and distracted themselves with increasingly arcane thickets of new standards, with the assumption that someone else would spend time on the “boring” work of building professional-quality tools and documentation or cleaning up usability warts. That other 80% of the work never happened and most people who had a choice moved on.
As a thought experience, imagine if libxml2 had had even a single dedicated developer focused on tracking standards or making usability improvements, instead of training multiple generations of users that XML was slow and hostile to users. Various XML committees’ travel expenses building standards which were never used likely cost more than that. Not leaving XPath frozen around the turn of the century would have helped in so many places.
The other wart I think would have made a surprising difference is the usability disaster around namespaces. So many tool developers forced users to switch between the short namespace:attribute form they used everywhere in the document and the {namespace url}attribute form that resolves to, or forced you to respecify the namespaces on every operation rather than reusing the values the parser had already loaded. Users begrudged that verbosity but they hated it when it meant something silently returned incorrect results because a selector using the document’s own syntax didn’t find the element they could see using those exact values. Absolutely nothing anyone did in the XML world was a better use of time than fixing that would have been since it trained people to think of XML-based tools as a painful, error-prone experience to be avoided — and they did as soon as they could.
XSLT was horrible though. A programming language with XML syntax, no thanks!
* Schemas violate underlying XML rules all the time, and due to variance in XML parsers, it gets a pass but only on specific configurations. There's no one Holy XML Parser. Leading whitespace in attributes? Sure, why not?! Whitespace sensitive element order? Ooh yeah, lots of it! But try and feed it to libxml and the whole thing collapses - without error handling, mind you, but that's not XML's fault.
* Whitespace agnostic means diffs and merges are element aware, and it's nigh-impossible to estimate the upper compute limit of an element aware diff
* There is NO OFFICIAL WHITESPACE SPEC - so if you try and fix it with normalization and lines, it's going to be different everywhere you go. So you're forced to switch on element aware diff / merge in your VCS, which is a pretty big change, and it's one you have to sell other departments on.
* XML breaking 1NF, which, aka means that XML can have XML inside of XML infinite recursion when playing data format = which breaks so very very many things . . it's actually going against the whole concept of a data hierarchy, which is built in to XML at a low level.
* Sort of riding on that, in order to parse XML you have to eat it element by element, and there's no way to tell when an element is going to end or of it recurses N levels. This makes it computationally expensive. The CAD STEP format has some of this disease - it has to be loaded in its entirety to parse, which can be holy hell with a TB file.
* Yeah sure, namespace hell. No one really figured out a way to fix it. S1000D (XML spec) to this day just denies that XML namespaces ever existed, and I can't really fault their WG for doing that. I can fault them for so many other things, but not that.
I.e. having content matching regexps, so you can be sure version numbers are always \d+(\.\d+)* and does not have unexpected letters or spaces in it. That date values are in ISO format.
Very useful for APIs to 3rd parties as it now is very easy to catch most errors and provide an useful error message for them without having to code every check.
If people keep burning their faces off it's not the fault of the flamethrower! If they keep hacking each other's limbs off it's not the fault of the chainsaw! If they keep blowing their houses up it's not the fault of the gas bottles! It's not like we need to accommodate every possible failure mode of the human brain, and make things safe just because the next idiot that comes along is going to cause a catastrophe. It's the idiot's fault that they forget to apply the breaks and take advantage of the failsafes, it's not the responsibility of the system to have the breaks and failsafes where any idiot can find them!
I've head that kind of excuse 0 times, outside of software circles. But of course I'm exaggerating because nobody really suffers because of XML other than the people who use it everyday. Like this once junior dev, for example, who was made to create XML by hand by eyballing an Excel document, just because paying a junior dev salary is easier than getting middle management (and clients) to learn to structure an Excel spreadsheet properly.
Sometimes, just sometimes, you have to design a tool with the way it will actually be used in the real world in mind. And all the rest of the time? Well, all the rest of the time you have to do it that way, too.
The issue isn't that it's bad per-se. The issue is that it's cumbersome to write. When you want to setup and tweak a tool, it gets annoying very quickly having to deal with little errors because you misspelled a closing tag. Maybe you want to test enabling a setting? Typing out 4 characters is so much less friction than the 20 or more (including the < > : / characters) you'd need for an XML config.
* `<?xml ...` something line lost a percentage of people already, etc.
* Essentially XML was a markup language (SGML stripped down), while you can use markup to mark up key-value pairs, it just wasn't designed or optimized for that.
* To the contrary, it also did away with the nice SGML features for that instead of `<tag>value</tag>` one could write `<tag/value/` and the self-closing tags (like `<ol><li> ... <li></ol>` in HTML. For most particularly deep config files self closing tags would have been perfect.
* XML attributes vs substructure was just an awkward choice to give out. I.e. `<myConfig userId="asdf" host="asdfasdf" ...>` vs `<myConfig><userId>asdf</userId><host>asdfasdf</host></myConfig>`.
More so I get the feeling when looking at the XSL, XSLT, etc. mania that a similar pattern was at work as with UML: special interest groups and tooling providers driving the evolution of a standard with their interests in mind and not the interest of users or developers.
Overall XML could have been something like this `<myConfig / <userId/asdf/ <host/asdfasdf/` if it was SGML.
YAML and JSON succeeded because they had a clean and predictable, no-nonsense mapping between encoding and object-model after decoding. Probably we should all switch to an almost-yaml format that does away with the peculiarities, and the FANG companies would have the momentum to make that happen.
I personally would like for HJSON (https://hjson.github.io) to see more adoption, but that train has passed...
Where XML falls down hard is tool usability. There’s still no standard formatter or good validation tool, and things like namespace support is a constant source of friction.
But then a YAML document written by disciplined and well-thinking people will also not have these problems.
Any of the complex formats work well when used in the most fitting way and go horribly wrong when people try to benefit too much from the complexity. XML was also a nightmare when put in the wrong hand, just as yaml is, and the next hyped and turing complete document syntax will be.
I don't hate XML myself, but I do hate how it's been abused over the years.
It's because so many people would see examples of complex _XML-based formats (WSDL, SOAP, etc.)_ and infer that _XML itself_ is complicated. XML is really not that difficult to understand, and I find it quite amusing that the same people who don't bat an eye at writing HTML complain about how baffling XML is.
Is writing an XML parser difficult? Yes, very much so. But again, that doesn't make XML itself complicated. And before anyone tries to call out this particular comment, keep in mind that writing a fast, correct, and safe JSON parser is no walk in the park either.
Are there many examples of complicated XML-based formats? Yes, but that's just a reflection of the complexity of some particular configuration model, not XML itself.
Additionally we had to update WSDLs for these services, but the service generating the WSDL used features our server's SOAP library didn't support, so someone had to manually transform the XML and via lengthy trial and error to get it to work.
If you need a enum that only allow specific value, you just list it.
For small flat trees the alternatives are preferable
- Many of these complaints are about YAML 1.1.
- YAML 1.2 was released _14 years ago_.
- The author makes some allusions to 1.2.2, and it requiring a team of people 14 years to make, but, from the yaml.com announcement they link to: “No normative changes from the 1.2.1 revision. YAML 1.2 has not been changed”
I guess my first two comments are undercut by PyYAML using YAML 1.1 (Really?! Python’s had 24 years of the Norway problem?!)
I highly recommend reading the article - it's very good.
I've spent a lot of time writing YAML for Ansible, Cloudformation, k8s, Helm, etc. Some of the issues this article mentions are pitfalls but once you get a bit of experience with it, you know what to look out for.
I've also spent and a lot of time writing Nix expressions, which is much more "joyful" IMO. Seemingly simple features like being able to create a function to reuse the same parameterized configuration makes life much easier.
Add in a layer of type safety and some integration with the 'parent' app (think replacements for CloudFormation's !GetAtt or Ansible's handlers), the ability to perform basic unit tests, then configuration becomes more like writing code which I consider a good thing.
It’s really no wonder that it’s hard to create a language that’s supposed to know whether the author intended to write a string or not without it being indicated by the syntax. No other language in the world tries to do this, for exactly the reasons this article points out.
Dead Comment
* no enclosing element (i.e., can be streamed)
* maps, lists, even sets
* tags (like "Person". UUIDs and timestamps are built-in tags)
* floating point numbers
* integers
* comments
* UTF-8
* true booleans
* no need to worry about too many or too few commas in the right or wrong place
Implementations in almost every language under the sun [1].
The format is simple enough that it's easy to implement, verify, and test. No strange string interpretation craziness (see YAML and "Norway problem"), no ambiguity between FP and integers (see JSON), comments. And if your editor has rainbow parenthesis support, reading is actually a pleasant experience.
[0]: https://github.com/edn-format/edn
[1]: https://github.com/edn-format/edn/wiki/Implementations
Deleted Comment