From my experience, while YAML itself is something one can learn to live with, the true horror starts when people start using text template engines to generate YAML. Like it's done in Helm charts, for example, https://github.com/helm/charts/blob/master/stable/grafana/te... Aren't these "indent" filters beautiful?
I developed Yet Another JSON Templating Language, whose main virtue was that it was extremely simple to use and implement, and it could be easily implemented in JavaScript or any other languages supporting JSON.
We had joy, we had fun, we had seasons in the sun, but as I added more and more features and syntax to cover specific requirements and uncommon edge cases, I realized I was on an inevitable death-march towards my cute little program becoming sufficiently complicated to trigger Greenspun's tenth rule.
There is no need for Yet Another JSON Templating Language, because JavaScript is the ultimate JSON templating language. Why, it even supports comments and trailing commas!
Just use the real thing to generate JSON, instead of trying to build yet another ad-hoc, informally-specified, bug-ridden, slow implementation of half of JavaScript.
> the true horror starts when people start using text template engines to generate YAML
I just had a shiver recalling a Kubernetes wrapper wrapper wapper wrapper at a former job. I think there were at least two layers of mystical YAML generation hell. I couldn't stop it, and it tanked much joy in my work. It was a factor in me moving on.
Surely the right approach needs to be generating the desired data programmatically, rendering back to YAML if needed, rather than building these files with text macros.
At my old place we developed a small tool that wraps CloudFormation with a templating language (jinja2). This was actually great as it CloudFormation is extremely verbose and often unnecessarily complex. Templating it out and adding custom functions to jinja2 made the cfn templates much easier to understand.
I think it all depends. Most of the time I would agree that you shouldn't template yaml, but sometimes, it's the lesser of two evils.
Templating CFN is really good practice once you hit a certain scale. If you have 5 DDB tables deployed to multiple regions, and on each of them you want to specify keys, attributes, throughput, and usage alarms, at a minimum. That’s already 30-40 values that need to be specified, depending on table schemas. Add EC2, auto scaling, networking, load balancer, and SQS/SNS—now untemplated cloud formation is really unpleasant to work with.
Some of the values like DDB table attributes are common across all regions, other values like tags are common across all infra in the same region. Some values are a scalar multiple of others, or interpolated from multiple sources. For example, a DDB capacity alarm for a given region is a conjunction of the table name (defined at the application level), a scalar multiple of the table capacity (defined at the regional deployment level), and severity (owned by those that will be on-call).
To add insult to injury, a stack can only have 60 parameters, which you butt up against quickly if you try to naively parameterize your deployment.
Given all these gripes, auto-generating CFN templates was easiest for me. I used a hierarchical config (global > application > region > resource) so the deployment params could be easily manipulated, maintained, and where “exceptions to the rule” would be obvious instead of hidden in a bunch of CFN yaml. To generate CFN templates I used ERB instead of jinja, but to similar effect.
A side benefit of this is side-stepping additional vendor lock-in in the form of the weird and archaic CFN operators for math, string concatenation, etc. I don’t have a problem learning them, but it’s one of those things that one person learns, then everyone who comes after them has to re-learn. My shop already uses ruby, so templating in the same language is a no-brainer.
> This was actually great as it CloudFormation is extremely verbose and often unnecessarily complex
I think its opposite, the most lean way to deploy AWS resources. Did you wrote it yourself, in text editor? I was doing it for 5 years now. You can omit values if you're fine with defaults, you only state what needs to be different. Other tip is use Export and ImportValue to link stacks.
I kept on using JSON, even after all my buddies jumped on YAML. JSON is just more reliable, harder to miss syntax errors, and can be made readable by not using linters and keep long lines that belong on one line. Also, the brackets are exactly what they are in Python :)
> wraps CloudFormation with a templating language (jinja2)
Not sure it it is a good idea. Everyone's use case is different, though. A well written CFN template is like a rubber stamp, just change the Parameters. The template itself doesn't need to change.
k8s and helm is where I learned to dislike yaml. I now want a compiled and type safe language that generates whatever config a system needs.
I'm pretty much thinking I want Go as a pre-config where I can set variables, loops, and conditionals and that my editor can help with auto-complete. Maybe I can "import github.com/$org/helmconfig" and in the end write one or more files for config.
Some templating languages such as Jsonnet[0] add built-in templating and just enough programmability to cover basic operations like templating and iteration.
I originally felt it was overly complex, but after seeing some of the Go text/template and Ansible Jinja examples in the wild, it actually seems like a good idea.
Perhaps we should more strongly distinguish between “basic” data definition formats and ones that need to be templated. JSON5 for the former and Jsonnet for the latter, for example.
agreed, text templating of yaml (or any structured content) does not make sense. too much context (actual config structure) is lost if plain text is used.
i've collaborated on ytt (https://get-ytt.io) - yaml templating tool. it works directly with yaml structure to bind templating directives. for example setting a value is associated with a specific yaml node so that you dont have to do any manual indenting etc. like you would with plain text templating. defining functions that return yaml structures becomes very easy as well. common problems such as improperly escaped values are gone.
i'm also experimenting with a "strict" mode [1] that raises error for questionable yaml features, for example, using NO to mean false.
i think that yaml is here to stay (at least for some time) and it's worth investing in making tools that make dealing with yaml and its common uses (templating) easier.
The issue is, I think most people (myself included) enter YAML into their lives as basically a JSON alternative with lighter syntax. Without really realizing, or perhaps without internalizing, the rather ridiculous number of different ways to represent the same thing, the painful subtle syntax differences that lead to entirely different representations, the sometimes difficult to believe number of features that the language has that are seldom used..
It's not just alternate skin for JSON, and yet that's what most people use it for. Some users also want things like map keys that aren't strings, which is actually pretty useful.
I recall there being CoffeeScript Object Notation as well... perhaps that would've been better for many use cases, all things said.
I've never understood this. JSON is really not that difficult to work with manually. I tend to write my config files as JSON for utilities I write. What is it with peoples' innate aversion to braces?
JSON is serviceable as an intermediate format, machine-generated and machine-consumed.
It is outright bad as a human-operated format. It explicitly lacks comments, it does not allow trailing commas, it lacks namespaces, to name a few pain points.
YAML is much more human-friendly, with all its problems.
The lack of comments is the real problem. When you need to explain why a particular parameter in the config file is set a certain way JSON becomes a real problem.
Seriously, our batch jobs for better or worse have configs with a bunch of parameters that are passed around as json, and while most variable names are intuitive and there is documentation on the wiki, and most often the config can be autogenerated by other tools it would still be better if when I manually open it in the config itself I would easily see the difference between n_run_threads vs n_reg_threads, etc...
They’re fairly useful in applications that use numeric IDs. For example, if I’m using SQL, and I have a table with an AUTOINCREMENT primary key, I’m going to have a lot of numeric IDs. If I want to reference these in a config file of some kind, I don’t want to have to read them as strings and handle the parsing on my end.
Even if you’re of the opinion that IDs shouldn’t be numeric, there are a lot of cases where you’re stuck with integers—on Linux, user IDs, group IDs, and inodes are just a few examples.
I was suspicious of YAML from day one, when they announced "Yet Another Markup Language (YAML) 1.0", because it obviously WASN'T a markup language. Who did they think they were fooling?
XML and HTML are markup languages. JSON and YAML are not markup languages. So when they finally realized their mistake, they had to retroactively do an about-face and rename it "YAML Ain’t Markup Language". That didn't inspire my confidence or look to me like they did their research and learned the lessons (and definitions) of other previous markup and non-markup languages, to avoid repeating old mistakes.
If YAML is defined by what it Ain't, instead of what it Is, then why is it so specifically obsessed with not being a Markup Language, when there are so many other more terrible kinds of languages it could focus on not being, like YATL Ain't Templating Language or YAPL Ain't Programming Language?
>YAML (/ˈjæməl/, rhymes with camel) was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki. Originally YAML was said to mean Yet Another Markup Language, referencing its purpose as a markup language with the yet another construct, but it was then repurposed as YAML Ain't Markup Language, a recursive acronym, to distinguish its purpose as data-oriented, rather than document markup.
>In computer text processing, a markup language is a system for annotating a document in a way that is syntactically distinguishable from the text. The idea and terminology evolved from the "marking up" of paper manuscripts (i.e., the revision instructions by editors), which is traditionally written with a red or blue pencil on authors' manuscripts. In digital media, this "blue pencil instruction text" was replaced by tags, which indicate what the parts of the document are, rather than details of how they might be shown on some display. This lets authors avoid formatting every instance of the same kind of thing redundantly (and possibly inconsistently). It also avoids the specification of fonts and dimensions which may not apply to many users (such as those with varying-size displays, impaired vision and screen-reading software).
The problem is with parsers, how they are implemented or used. YAML actually has a way to specify type of the data, alternatively the application supposed to suggest desired type. What's this take is showing is what types are assumed when they are not specified.
I'll say it: I think YAML is great and a joy to use for configuration files. I can write it even with the dumbest editor, I can write comments, multi-line strings, I can get autocompletion and validation with JSON schema, I can share and reference other values. It allows tools to have config schemas that read like a natural domain specific language, but you already know the syntax. I haven't had problems with it at all.
This was me too - until yesterday, when I made a minor change to one of our YAML config files and everything broke. On investigation it turned out that all of our YAML files had longstanding errors but those errors happened to be valid syntax and also did not cause any bad side effects, so we had been getting away with it by pure luck until I made a change that happened to expose the problem.
That would make me not a fan of the particular parsers/validators I've been using, rather than not a fan of YAML.
The big strike against YAML I see there is that it needs a good conformance test suite and implementations need to be tested against it. But that's not a problem with the format but a fairly easy to fix ecosystem problem.
I agree. As long as you're using a strict parser, I've found YAML to be much nicer for configuration than JSON. I use Python's ruamel.yaml library, and have never had any weird type problems. Once the nesting gets too deep, it can be a pain. but that's the same for JSON.
I have found myself using TOML more and more for configuration, though. It helps a lot with keeping things flat and easy to read. I'll still prefer YAML over JSON for human-writable files, but I'm starting to prefer TOML over YAML.
I've got to say it is the most frustrating config file ever to wrote. The only time I have to use it is for Docker Compose and I am constantly fighting vim on indentation and trying to make sense of confusing errors about "unexpected block start." Do you have any suggested vimrc for YAML?
That's really close to [RFC 7464](https://tools.ietf.org/html/rfc7464), JSON Text Sequences. It uses U+001E RECORD SEPARATOR. The `jq` tool supports those if you pass a flag.
Having to close contexts is a VERY good 'sanity check' to see if something is malformed or not.
If appending is necessary make the parser handle multiple copies of the namespace and merge them upon output. Unknown keys and sections should also always be copied from input to output (this is how you embed comments).
S-Expressions are quite simple, there are some parsers floating around in well-known projects, although I'm not sure they're SAX-style: https://leon.bottou.org/projects/minilisp
I also wonder if you need a text format, or if SQLite or systemd's journal API would work.
I love proto, but the textformat was an after thought. The binary format is rigorously defined, portable, extensible and optimized. The text format was reverse engineered from the c++ implementation after the fact when folks found textproto useful. Unfortunately there are discrepancies between languages around the corner cases of the textformat and that's the sad world we live in. Avoid letting textproto be part of your user exposed interface.
TOML would be great, if not for an annoying obscure detail in the specification that makes it hard to use for my typical use cases (scientific computation) [1]. Moreover, I find quite unintuitive how you are supposed to specify array of tables [2]: this kind of is much easier in JSON (which is the format I am currently using, although it is far from perfect).
Try just using line-delimited JSON objects (http://jsonlines.org/). It ticks all of your boxes, especially 3: "jq -s '.cmd' fish_history | histogram".
Neither YAML or Protobufs are quite as easy as that.
All in all it's ridiculously simple, easy to parse in a variety of languages and each row is a single line that's simple to iteratively parse without loading the whole thing into memory.
this seems like it makes json useful for logging, but not too useful as configuration. For instance, it doesn't support commenting, and it seems like every line needs to have all its children compressed onto one line?
Tcl with control structure commands disabled and infix assignment for convenience. Jim Tcl is a lightweight implementation if the main line isn't workable.
I've used YAML as the format for a config file, and I certainly regret that choice. Trying to explain to someone that doesn't know YAML how to edit it without setting them up for failure is quite annoying. There are too many non-obvious ways to screw up, like forgetting the space after the colon or of course bad indentation.
YAML is easier to read and write. That's the benefit. It's also always going to be smaller than anything JSON or XML. Maybe it's not as correct, maybe some people don't like it, I don't really mind it. I don't see it really going anywhere soon either considering Kubernetes and the lack of alternatives in widespread usage.
I've never had someone that needed extensive help understanding YAML and that's besides reviewing work for people just coming up to speed. Find me an IDE or editor that doesn't have YAML support. Also, YAML supports comments so if you have pitfalls people need to know about you can document them inline.
Your argument is people who don't know things might screw stuff up. Well Yeah! This applies to everything.
Your editor makes a world of difference here. Since you shouldn't be writing brace-language code without indents anyways, the biggest issue remaining is mixing tabs and spaces. Gedit makes this a big pain with it's default config (it doesn't even auto-indent) but Atom and IDLE handle it well.
The main headaches are due to people either wanting to copy and paste code from various sites, or wanting to write really deeply nested code.
If you're writing well structured, original code in Python, it's generally cleaner and easier than other languages because the syntax avoids ambiguities that other languages have.
The difference in my experience is that once you know what's wrong with your whitespaces in Python, you're out of the woods. The interpreter is your friend from that point onward. YAML parsers, on the other hand, give you these really strange errors that are pretty difficult to understand, and it doesn't end with whitespaces.
There are quite a few comments saying they don't like python even from 10+ year users.
Language becomes popular largely through library ecosystem and resources around it, not just how the language looks. I think Google embracing it had a good role in acquiring mind shares.
YAML is so bad for human writing. Everytime I write ansible tasks, I get confused with indentation and how to do arrays etc. JSON and YAML is frankly a generation behind compared to TOML or JSON5.
I’m not keen on how so many tools and services opt for YAML by default, either. Both JSON and YAML are a nightmare to handle once you’ve got 3000 line files and several layers of nesting.
CI would be a lot nicer to use if it didn’t rely on a single YAML file to work. And if you want to switch, suddenly you had a build step to convert back to YAML.
As an ansible user, I hate YAML and its broken parsers with a passion, but the security objection does not make much sense. It does apply verbatim to any parser of anything if the implementation decides that a given label means "eval this content right away". I fail to see how this can be a fault of the DDL rather than the parser's.
The reason this is a fault of the DDL and not the parser is that the DDL spec decides that it has label that evaluates a command. The parser then has two options, either implement it or not conform to the spec (and essentially implementing a different DDL). For programming languages it makes sense to have an eval label/command. For configuration/serialization DDLs I think it's a terrible choice.
And terrible it is indeed, but I cannot find it specified - the strings eval, exec, command, statement do not even occur in the official specs (shallow doc perusal, I know)
> As an ansible user, I hate YAML and its broken parsers with a passion
Could you elaborate on this? I use Ansible daily and I've never had a problem with YAML once I took some time to understand it. What do you mean by broken parsers? I'm assuming that's something Ansible specific you are referring to.
I intensely dislike yaml's whitespace-based syntax because whitespace is white, and it gives very little visual context expecially in long, nested documents. Editors that expand/collapse branches do help some, but are no match for highlighting matching pair of braces in other saner formats/languages (I am also not a fan of syntactic whitespace in python, if you get my drift.)
And ansible's parser is broken, in more ways that I can remember (haven't been writing playbooks and stuff for a couple of months now). If you like pointless pain, try embedding ":" in task names for a demo (or one of other several "meta" characters: the colon is just the one that ends to recur most).
I will give a passing mention to the smug, vague error message "You have an error at position (somewhere in the middle of the file) It seems that you are missing... (something) we may be wrong (they almost always are) but it appears it begins in position (some position close to the first line)" that sets off a hunt for the missing brace/colon/space/whatever and makes me want to do stuff to the person who devised it.
This compounds with the confusion brought on weaving of yaml's and jinjia2 syntaxes and ansible's own flakiness on deciding what is evaluated when - which decides when and if a variable does indeed change, when does yes means "yes" rather than true or 1, but not '1' or "true" (try prompting the user for a boolean variable, and find yourself writing if ( (switch == "true") or (switch == "1")) in short order).
Pity that ansible is so damn convenient, or I would have ditched it long time ago for anything - bash included (OK, maybe not bash).
My vote is yes. Most configuration doesn’t need anything more sophisticated than key-value pairs, perhaps with namespaces. INI can manage that and TOML is basically a better-specified INI.
I can't tell if I've spent too much time on HN or if I came to this conclusion on my own, but TOML is my language of choice for configuration now. It's flexible in the right ways and sectioning of config is so important.
.ini, followed by TOML, followed by an identical implementation of some other app's config format.
The biggest problem with config formats is they mislead users into thinking they understand the format. The user tries to edit it by hand, and chaos ensues. So only formats that are stupidly simple, or whose warts are already familiar and well documented, are good choices.
Apache had a great configuration format. Nothing else used it (that I knew of) but you could in theory implement "Apache configs" and then people'd just have to look up how to write those, which there's lots of examples of.
JSON and YAML and XML are data formats; they should only be written by machines, and read by humans. Same with protocols like HTTP, Telnet, FTP... You're not supposed to write it yourself, but it's readable to make troubleshooting easier.
Data formats are nice for expressing nested data structures, but then they don't (usually) support logical expressions; at that point you need a template/macro/programming language, and at that point you're writing code, which will need to be tested, and at that point you should just write modules and use a config format to give them arguments. Every complex tool goes through the same evolution.
If you care about your users, write a tool to generate configs based on a wizard. Good CLI tools do this, and it really makes life better. (It's also a great way to document all your config features in code, and test them)
If possible, prefer what tools in your vicinity use. My team uses Kubernetes and Concourse extensively, which both use YAML, so I tend to stick with YAML since people are already familiar with it.
(More recently, I've come around to prefer plain environment variables for configuration, but that only works nicely when the amount of configuration is fairly limited, say 20 values instead of 1000 values.)
In the scale world, HOCON is very nice. It’s a format designed explicitly for config files, and has a lot of niceties (like you can append files together and they merge correctly, so you don’t have to end up with giant config files)
I agree with HOCON being nice based on personal usage but I haven't seen an in depth analysis of it. This is the canonical parser for JVM based languages — https://github.com/lightbend/config, are there many other implementations that are widely used?
I think it's horses for courses. JSON I guess is the best for interchange i.e machine to machine, but I never want to edit it by hand; XML is relatively easy to read but can be quite painful to edit raw, but it can be quite easy to develop a structures editor. I’d favour it for document persistence. YAML is fine for configuration files but I would be careful about how I apply it and would always provide it as a heavily documented templated config file. YAML when used correctly is by far the easiest to edit in the clear, with a plain text editor. With that said, I would try to get away with basic namespaces properties files first before I’d go that far ...
ini if needs are crazy simple, YAML if you need a structure like JSON's but with something any human ever needs to interact with. JSON if humans aren't in the loop.
TOML, in my opinion, is like a weird mishmash of JSON, ini, and bashisms. Though I have worked with it a lot less than the other formats, so YMMV.
The main issue I had with TOML is how much more syntactically noisy it is. Equivalent files with 2-3 levels of nesting usually become at least 50% longer than equivalent YAML.
This is a different use case, I think. This example is defining content, not configuration. In this case the content is user stories. I agree for creating sequences of documents/content in this way, YAML often is nicer. But for configuration, TOML is designed to specify it in a simple and flat way, and that can be very helpful.
I have some projects where I'm frequently writing and midifying content that resembles the example here, and I use YAML there and plan to keep using YAML. For most other things, I'm just doing configuration, so I use TOML. No reason you need to stick to one or the other.
There's no white Knight here, they all suck in some way. Personally I've had decent success with yaml as simple configuration, but I would never use it as an interchange format. If you know it's caveats and you're targeting one language so you can become familiar with the parser it's serviceable.
I say just use JSON. Everyone knows it already and it's good enough. Use a parser in your app that allows comments and trailing commas like vscode does.
I use JSON in the end. I prefer to write TOML, then parse that into JSON. This seems to strike a nice balance between human/machine write/read. It's simple enough to reason TOML, even if it gets verbose. If I have to write YAML after 2 layers I usually write it as JSON and include the JSON in the 2nd level of key.
My cursory survey of config / serialisation formats concluded that nothing is close to being good.
It's overly verbose, and hard to understand XML, it's no comments son, horrors of yaml or some okay format that doesn't have parsers for the languages (plural) you are using on your project.
For in-house, python-only project, my way to go is to create a "config.py". Then I declare a bunch of module variables that can be overridden by environment variables as a bonus.
We had joy, we had fun, we had seasons in the sun, but as I added more and more features and syntax to cover specific requirements and uncommon edge cases, I realized I was on an inevitable death-march towards my cute little program becoming sufficiently complicated to trigger Greenspun's tenth rule.
https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule
There is no need for Yet Another JSON Templating Language, because JavaScript is the ultimate JSON templating language. Why, it even supports comments and trailing commas!
Just use the real thing to generate JSON, instead of trying to build yet another ad-hoc, informally-specified, bug-ridden, slow implementation of half of JavaScript.
I just had a shiver recalling a Kubernetes wrapper wrapper wapper wrapper at a former job. I think there were at least two layers of mystical YAML generation hell. I couldn't stop it, and it tanked much joy in my work. It was a factor in me moving on.
oh god why
Surely using an encoder on an object/structure hierarchy (like people do with encoding/json) is the way to go?
On the other hand, the quality of the yaml libraries in Go wasn't great, last time I had to choose a configuration file format.
I think it all depends. Most of the time I would agree that you shouldn't template yaml, but sometimes, it's the lesser of two evils.
Some of the values like DDB table attributes are common across all regions, other values like tags are common across all infra in the same region. Some values are a scalar multiple of others, or interpolated from multiple sources. For example, a DDB capacity alarm for a given region is a conjunction of the table name (defined at the application level), a scalar multiple of the table capacity (defined at the regional deployment level), and severity (owned by those that will be on-call).
To add insult to injury, a stack can only have 60 parameters, which you butt up against quickly if you try to naively parameterize your deployment.
Given all these gripes, auto-generating CFN templates was easiest for me. I used a hierarchical config (global > application > region > resource) so the deployment params could be easily manipulated, maintained, and where “exceptions to the rule” would be obvious instead of hidden in a bunch of CFN yaml. To generate CFN templates I used ERB instead of jinja, but to similar effect.
A side benefit of this is side-stepping additional vendor lock-in in the form of the weird and archaic CFN operators for math, string concatenation, etc. I don’t have a problem learning them, but it’s one of those things that one person learns, then everyone who comes after them has to re-learn. My shop already uses ruby, so templating in the same language is a no-brainer.
https://github.com/cloudtools/troposphere
The basic type checking done was quite helpful, and avoided some of the dumb errors that we had run into when we attempted to do everything by hand.
I think its opposite, the most lean way to deploy AWS resources. Did you wrote it yourself, in text editor? I was doing it for 5 years now. You can omit values if you're fine with defaults, you only state what needs to be different. Other tip is use Export and ImportValue to link stacks.
I kept on using JSON, even after all my buddies jumped on YAML. JSON is just more reliable, harder to miss syntax errors, and can be made readable by not using linters and keep long lines that belong on one line. Also, the brackets are exactly what they are in Python :)
> wraps CloudFormation with a templating language (jinja2)
Not sure it it is a good idea. Everyone's use case is different, though. A well written CFN template is like a rubber stamp, just change the Parameters. The template itself doesn't need to change.
I'm pretty much thinking I want Go as a pre-config where I can set variables, loops, and conditionals and that my editor can help with auto-complete. Maybe I can "import github.com/$org/helmconfig" and in the end write one or more files for config.
I originally felt it was overly complex, but after seeing some of the Go text/template and Ansible Jinja examples in the wild, it actually seems like a good idea.
Perhaps we should more strongly distinguish between “basic” data definition formats and ones that need to be templated. JSON5 for the former and Jsonnet for the latter, for example.
i've collaborated on ytt (https://get-ytt.io) - yaml templating tool. it works directly with yaml structure to bind templating directives. for example setting a value is associated with a specific yaml node so that you dont have to do any manual indenting etc. like you would with plain text templating. defining functions that return yaml structures becomes very easy as well. common problems such as improperly escaped values are gone.
i'm also experimenting with a "strict" mode [1] that raises error for questionable yaml features, for example, using NO to mean false.
i think that yaml is here to stay (at least for some time) and it's worth investing in making tools that make dealing with yaml and its common uses (templating) easier.
[1] https://github.com/k14s/ytt/blob/master/docs/strict.md
It's not just alternate skin for JSON, and yet that's what most people use it for. Some users also want things like map keys that aren't strings, which is actually pretty useful.
I recall there being CoffeeScript Object Notation as well... perhaps that would've been better for many use cases, all things said.
It is outright bad as a human-operated format. It explicitly lacks comments, it does not allow trailing commas, it lacks namespaces, to name a few pain points.
YAML is much more human-friendly, with all its problems.
Seriously, our batch jobs for better or worse have configs with a bunch of parameters that are passed around as json, and while most variable names are intuitive and there is documentation on the wiki, and most often the config can be autogenerated by other tools it would still be better if when I manually open it in the config itself I would easily see the difference between n_run_threads vs n_reg_threads, etc...
I made it largely because I saw a disconnect with what YAML was, and what people - including me - thought it was (which is what it should be).
Don't agree with non-string map keys though... they're a complication I never saw a use for.
Even if you’re of the opinion that IDs shouldn’t be numeric, there are a lot of cases where you’re stuck with integers—on Linux, user IDs, group IDs, and inodes are just a few examples.
https://yaml.org/spec/history/2001-08-01.html
XML and HTML are markup languages. JSON and YAML are not markup languages. So when they finally realized their mistake, they had to retroactively do an about-face and rename it "YAML Ain’t Markup Language". That didn't inspire my confidence or look to me like they did their research and learned the lessons (and definitions) of other previous markup and non-markup languages, to avoid repeating old mistakes.
If YAML is defined by what it Ain't, instead of what it Is, then why is it so specifically obsessed with not being a Markup Language, when there are so many other more terrible kinds of languages it could focus on not being, like YATL Ain't Templating Language or YAPL Ain't Programming Language?
https://en.wikipedia.org/wiki/YAML#History_and_name
>YAML (/ˈjæməl/, rhymes with camel) was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki. Originally YAML was said to mean Yet Another Markup Language, referencing its purpose as a markup language with the yet another construct, but it was then repurposed as YAML Ain't Markup Language, a recursive acronym, to distinguish its purpose as data-oriented, rather than document markup.
https://en.wikipedia.org/wiki/Markup_language
>In computer text processing, a markup language is a system for annotating a document in a way that is syntactically distinguishable from the text. The idea and terminology evolved from the "marking up" of paper manuscripts (i.e., the revision instructions by editors), which is traditionally written with a red or blue pencil on authors' manuscripts. In digital media, this "blue pencil instruction text" was replaced by tags, which indicate what the parts of the document are, rather than details of how they might be shown on some display. This lets authors avoid formatting every instance of the same kind of thing redundantly (and possibly inconsistently). It also avoids the specification of fonts and dimensions which may not apply to many users (such as those with varying-size displays, impaired vision and screen-reading software).
YAML is bad.
Every YAML parser is a custom YAML parser.
https://matrix.yaml.io/valid.html
So now no longer a YAML fan...
The big strike against YAML I see there is that it needs a good conformance test suite and implementations need to be tested against it. But that's not a problem with the format but a fairly easy to fix ecosystem problem.
I have found myself using TOML more and more for configuration, though. It helps a lot with keeping things flat and easy to read. I'll still prefer YAML over JSON for human-writable files, but I'm starting to prefer TOML over YAML.
Boxes to check:
1. Self describing format
2. SAX-style parser available to C++
3. Easy for users to understand and ad-hoc parse using command-line tools
4. No document closing necessary, so appending is trivial
YAML looks pretty good:
protobuf is also an option: though I am unsure of how well its text serialization is supported.Any suggestions?
Here's a proposal: use a Tree Language.
I created a demo for you called "Fished": https://github.com/breck7/fished.
Took me just a few minutes but already get type check, autocomplete, syntax highlighting, and more.
Tree Notation is early, and there will be kinks until the community is bigger, but I think it may be useful for you.
http://treenotation.org/designer/#grammar%0A%20fishedNode%0A...
Deleted Comment
Ps. Thanks for (all the) fish, it's my daily driver shell and keeps me that much more sane c.f. the alternatives.
Having to close contexts is a VERY good 'sanity check' to see if something is malformed or not.
If appending is necessary make the parser handle multiple copies of the namespace and merge them upon output. Unknown keys and sections should also always be copied from input to output (this is how you embed comments).
To clarify the requirement, history could be a JSON array of objects:
To append an entry to this file and keep it valid, one must locate the closing square bracket and overwrite it. That work is what I hope to avoid.Better than nothing I guess, but I'd say just use a syntax that supports comments.
I also wonder if you need a text format, or if SQLite or systemd's journal API would work.
[1] https://github.com/toml-lang/toml/issues/356
[2] https://github.com/toml-lang/toml#user-content-array-of-tabl...
It fulfills all of the requirements. There are several available C++ TOML parsers, including one from Boost.
Try just using line-delimited JSON objects (http://jsonlines.org/). It ticks all of your boxes, especially 3: "jq -s '.cmd' fish_history | histogram".
Neither YAML or Protobufs are quite as easy as that.
All in all it's ridiculously simple, easy to parse in a variety of languages and each row is a single line that's simple to iteratively parse without loading the whole thing into memory.
I've never had someone that needed extensive help understanding YAML and that's besides reviewing work for people just coming up to speed. Find me an IDE or editor that doesn't have YAML support. Also, YAML supports comments so if you have pitfalls people need to know about you can document them inline.
Your argument is people who don't know things might screw stuff up. Well Yeah! This applies to everything.
You may be surprised to find that there’s significant disagreement on that point.
Quite the opposite.
I like to format my code nicely anyways (or rather, mostly my editor does it for me because I’ve asked it to do so).
I indent with two spaces usually, regardless of language. And have my editors configured to insert two spaces when I press tab.
JavaScript, Rust, Python, C. Same difference, in terms of how I use whitespace.
If you're writing well structured, original code in Python, it's generally cleaner and easier than other languages because the syntax avoids ambiguities that other languages have.
Language becomes popular largely through library ecosystem and resources around it, not just how the language looks. I think Google embracing it had a good role in acquiring mind shares.
https://news.ycombinator.com/item?id=20672051
CI would be a lot nicer to use if it didn’t rely on a single YAML file to work. And if you want to switch, suddenly you had a build step to convert back to YAML.
This is simply wrong. There is nothing in the spec stating that.
Could you elaborate on this? I use Ansible daily and I've never had a problem with YAML once I took some time to understand it. What do you mean by broken parsers? I'm assuming that's something Ansible specific you are referring to.
And ansible's parser is broken, in more ways that I can remember (haven't been writing playbooks and stuff for a couple of months now). If you like pointless pain, try embedding ":" in task names for a demo (or one of other several "meta" characters: the colon is just the one that ends to recur most).
I will give a passing mention to the smug, vague error message "You have an error at position (somewhere in the middle of the file) It seems that you are missing... (something) we may be wrong (they almost always are) but it appears it begins in position (some position close to the first line)" that sets off a hunt for the missing brace/colon/space/whatever and makes me want to do stuff to the person who devised it.
This compounds with the confusion brought on weaving of yaml's and jinjia2 syntaxes and ansible's own flakiness on deciding what is evaluated when - which decides when and if a variable does indeed change, when does yes means "yes" rather than true or 1, but not '1' or "true" (try prompting the user for a boolean variable, and find yourself writing if ( (switch == "true") or (switch == "1")) in short order).
Pity that ansible is so damn convenient, or I would have ditched it long time ago for anything - bash included (OK, maybe not bash).
Is it TOML as the author seems to prefer at the end?
The biggest problem with config formats is they mislead users into thinking they understand the format. The user tries to edit it by hand, and chaos ensues. So only formats that are stupidly simple, or whose warts are already familiar and well documented, are good choices.
Apache had a great configuration format. Nothing else used it (that I knew of) but you could in theory implement "Apache configs" and then people'd just have to look up how to write those, which there's lots of examples of.
JSON and YAML and XML are data formats; they should only be written by machines, and read by humans. Same with protocols like HTTP, Telnet, FTP... You're not supposed to write it yourself, but it's readable to make troubleshooting easier.
Data formats are nice for expressing nested data structures, but then they don't (usually) support logical expressions; at that point you need a template/macro/programming language, and at that point you're writing code, which will need to be tested, and at that point you should just write modules and use a config format to give them arguments. Every complex tool goes through the same evolution.
If you care about your users, write a tool to generate configs based on a wizard. Good CLI tools do this, and it really makes life better. (It's also a great way to document all your config features in code, and test them)
(More recently, I've come around to prefer plain environment variables for configuration, but that only works nicely when the amount of configuration is fairly limited, say 20 values instead of 1000 values.)
For my own use, I do prefer TOML.
TOML, in my opinion, is like a weird mishmash of JSON, ini, and bashisms. Though I have worked with it a lot less than the other formats, so YMMV.
More here : https://hitchdev.com/strictyaml/why-not/toml/
I have some projects where I'm frequently writing and midifying content that resembles the example here, and I use YAML there and plan to keep using YAML. For most other things, I'm just doing configuration, so I use TOML. No reason you need to stick to one or the other.
https://dhall-lang.org/
https://json5.org/
Aside from lack of comments, the other major thing that can sometime make json a bad config choice is lack of multi-line strings.
Deleted Comment
It's overly verbose, and hard to understand XML, it's no comments son, horrors of yaml or some okay format that doesn't have parsers for the languages (plural) you are using on your project.
Deleted Comment