Readit News logoReadit News
phlakaton commented on XML is a cheap DSL   unplannedobsolescence.com... · Posted by u/y1n0
da_chicken · 4 hours ago
> 1. I think attributes absolutely should exist. They're great for describing metadata related to the tag: e.g. element ID, language, datatype, source annotation, namespacing. They add little in complexity.

No, they're barely adequate for those purposes. And you could (and if you have a XSD you probably should) still replace them with elements. If you argue that you can't, then you're arguing that JSON does not function. You can just inline metadata along side data. That works just fine. That's the thing about metadata. It's data!

You don't need attributes. Having worked in information systems for 25 years now, they are the most heavily, heavily, heavily misused feature of XML and they are essentially always wrong.

Because when someone represents data like this:

  <Person>  
    <ID>90034</ID>  
    <FirstName>Anthony</FirstName>  
    <MiddleName />
    <LastName>Perkins</LastName>  
    <Site>4302</Site>  
  </Person>  
You can write a XSD with the full set of rules for schema validation.

On the other hand, if you do this:

  <Person ID="90034"  
    FirstName="Anthony"  
    MiddleName=""
    LastName="Perkins"  
    Site="4302" />
Well, now you're a bit stuck. You can make the XSD look at basic data types, and that's it. You can never use complex types. You can never use multiple values if you need it, or if you do you'll have to make your attribute a delimited string. You can never use complex types. You can't use order. You're limiting your ability to extend or advance things.

That's the problem with XML. It's so flexible it lets developers be stupid, while also claiming strictness and correctness as goals.

> 2. The point of a close tag with a name is to make it unambiguous what it's trying to close off.

Sure, but the fact that closing tags in the proper order is is mandatory, you're not actually including anything at all. The only thing you're doing is introducing trivial syntax errors.

Because the truth is that this is 100% unambiguous in XML because the rules changed:

  <Person>  
    <ID>90034</>  
    <FirstName>Anthony</>  
    <MiddleName />
    <LastName>Perkins</>  
    <Site>4302</>  
  </>  
The reason SGML had a problem with the generic close tag was because SGML didn't require a closing tag at all. That was a problem It didn't have `<tag />`. It let you say `<tag1><tag2>...</tag1>` or `<tag1><tag2>...</>`.

Named closing tags had more of a point when we were actually writing XML by hand and didn't have text editors that could find the open and close tags for you, but that is solved. And now we have syntax highlighting and hierarchical code folding on any text editor, nevermind dedicated XML editors.

> 3. As far as schema support, it seems to me that JSON Schema is well-established and perfectly cromulent

Then my guess is that you have worked exclusively in the tech industry for customers that are also exclusively in the tech industry. If you have worked in any other business with any other group of organizations, you would know that the rest of the world is absolute chaos. I think I've seen 3 examples of a published JSON Schema, and hundreds that do not.

> 4. As far as tabular data, neither XML nor JSON were built for efficient tabular data representation, so it shouldn't be a surprise that they're clunky at this. Use the right tool for the job.

No, I think you're looking at what the format was intended to do 25 years ago and trying to claim that that should not be extended or improved ever. You're ignoring what it's actually being used for.

Unless you're going to make data queries return large tabular data sets to the user interface as more or less SQLite or DuckDB databases so the browser can freely manipulate them for the user... you're kind of stuck with XML or JSON or CSV. All of which suck for different reasons.

phlakaton · an hour ago
1. I don't disagree that attributes have been abused – so have elements – but you yourself identified the right way to use them. Yes, you can inline attributes, but that also leads to a document that's harder to use in some cases. So long as you use them judiciously, it's fine. In actual text markup cases, they're indispensable, as HTML illustrates.

2. As far as JSON Schema, you're wrong on all acounts – wrong that I haven't seen Some Stuff, wrong that JSON schema doesn't get used (see Swagger/OpenAPI), and wrong that XML Schema doesn't also get underitilized when a group of developers get lackadaisical.

3. As far as what historical use has been, I'm less interested in exhuming historical practice than simply observing which of the many use cases over the last 20 years worked well (and still work) and which didn't. The answer isn't that none of them worked, and it certainly isn't that XML users had a better bead on how to use it 20 years ago – it went through a massive hype curve just like a lot of techs do.

4. Regarding tabular data exchange, I stand by my statement. Use XML or JSON if you must, and sometimes you must, but there are better tools for the job.

phlakaton commented on XML is a cheap DSL   unplannedobsolescence.com... · Posted by u/y1n0
da_chicken · 6 hours ago
I've said it before, but I maintain that XML has only two real problems:

1. Attributes should not exist. They make the document suddenly have two dimensions instead of one, which significantly increases complexity. Anything that could be an attribute should actually be a child element.

2. There should be one close tag: `</>` which closes the last element, which burns a significant amount of space with useless syntax. Other than that and the self-closing `<tag />` (which itself is less useful without attributes) there isn't much that you need. Maybe a document close tag like `<///>`

You'll notice that, yes, JSON solves both of those things. That's a part of why it's so popular. The other is just that a lot more effort was put into maximizing the performance of JavaScript than shredding XML, and XSLT, the intended solution to this problem, is infamous at this point.

The problem of comments is kind of a non-issue in practice, IMO. You can just add a `"_COMMENT"` element or similar. Sure, yes, it will get parsed. But you shouldn't have that many comments that it will cause a genuine performance issue.

However, JSON still has two problems:

1. Schema support. You can't validate that a file before de-serializing it in your application. JSON Schema does exist, but it's support is still thin, IMX.

2. Many serializers are pretty bad with tabular data, and nearly all of them are bad with tabular data by default. So sometimes it's a data serialization format that's bad at serializing bulk data. Yeah, XML is worse at this. Yeah, you can use the `"colNames": ["id", ...], "rows": [ [1,...],[2,...] ]` method or go columnar with `"id": [1,2,...], "name": [...], "createDate": [...]`, but you had better be sure both ends can support that format.

In both cases, it seems like there is an attempt to resolve both of those issues. OpenAPI 3.1 has JSON schema included in it. The most popular JSON parsers seem to be adding tabular data support. I guess we'll see.

phlakaton · 5 hours ago
I disagree on several points here:

1. I think attributes absolutely should exist. They're great for describing metadata related to the tag: e.g. element ID, language, datatype, source annotation, namespacing. They add little in complexity.

2. The point of a close tag with a name is to make it unambiguous what it's trying to close off.

It sounds to me like what you want is not a better XML, but just s-exprs. Which is fine, but not quite solving the same problem.

3. As far as schema support, it seems to me that JSON Schema is well-established and perfectly cromulent – so much so that YAML authors are trying to use it to validate their stuff (the poor bastards) – and XML schema validation, while robust, is a complex and fragmented landscape around DTD, XSD, RELAX-NG, and Schematron. So although XML might have the edge, it's a more nuanced picture than XML proponents are claiming.

4. As far as tabular data, neither XML nor JSON were built for efficient tabular data representation, so it shouldn't be a surprise that they're clunky at this. Use the right tool for the job.

phlakaton commented on XML is a cheap DSL   unplannedobsolescence.com... · Posted by u/y1n0
ACCount37 · 11 hours ago
That's my point. By the time you hit "until it doesn't", you're already doing JSON, and were for a while.

Also, is "parse well if there's a missing bracket" even a desirable property? If you get files with mangled syntax, something has already gone horribly wrong. And, chances are, there is no way to parse them that would be correct.

phlakaton · 8 hours ago
By "parses well" in that case I mean "can identify where the error is, and maybe even infer the missing closing tag if desirable;" i.e. error reporting and recovery.

If you've ever debugged a JSON parse error where the location of the error was the very end of a large document, and you're not sure where the missing bracket was, you'll know what I mean. (S-exprs have similar problems, BTW; LISPers rely on their editors so as not to come to grief, and things still sometimes go pear-shaped.)

phlakaton commented on XML is a cheap DSL   unplannedobsolescence.com... · Posted by u/y1n0
foltik · 9 hours ago
Because (imo) the goal should be to minimize overall complexity.

Pulling in XML and all of its additional complexity just to get a (debatably) cleaner way to express tagged unions doesn’t seem like a great tradeoff.

I also don’t buy the degenerate argument. XML is arguably worse here since you have to decide between attributes, child nodes, and text content for every piece of data.

phlakaton · 9 hours ago
Depends on the application, I suppose. For OP's application, pulling in XML is no trouble and gives you a much better solution for typed unions.

To get better than XML, I think you're looking at something closer to a Haskell- or LISP-embedded DSL, with obvious trade-offs when it comes to developer ecosystems and interoperability.

phlakaton commented on XML is a cheap DSL   unplannedobsolescence.com... · Posted by u/y1n0
twic · 12 hours ago
FWIW you can do a better job with the JSON structure than in the article:

    {"GreaterOf": [
        {"Value": [0, "Dollar"]},
        {"Subtract": [
            {"Dependency": ["/totalTentativeTax"]},
            {"Dependency": ["/totalNonRefundableCredits"]}
        ]}
    ]}
Basically, a node is an object with one entry, whose key is the type and whose value is an array. It's a rather S-expressiony approach. if you really don't like using arrays for all the contents, you could always use more normal values at the leaves:

    {"GreaterOf": [
        {"Value": {"value": 0, "kind": "Dollar"}},
        {"Subtract": {
            "minuend": {"Dependency": "/totalTentativeTax"},
            "subtrahend": {"Dependency": "/totalNonRefundableCredits"}
        }}
    ]}
It has the nice property that you're always guaranteed to see the type before any of the contents, even if object keys get reordered, so you can do streaming decoding without having to buffer arbitrary amounts of JSON. Probably not important when parsing a tax code, but can be useful for big datasets.

phlakaton · 10 hours ago
Aesthetically, I consider such JSON structures degenerate. It's akin to building a ECMAScript app where every class and structure is only allowed to have one member.

If you want tagged data, why not just pick a representation that does that?

phlakaton commented on XML is a cheap DSL   unplannedobsolescence.com... · Posted by u/y1n0
SoftTalker · 12 hours ago
I used xml and xpath a lot in the early 2000s when it was popular, and I never wrote or learned about schema validation. It's totally optional and I never found a need for it.

It's probably helpful for "standard data interchange between separate parties" use cases, in what I was doing I totally controlled the production and the interpretation of the xml.

phlakaton · 10 hours ago
For this application, where you might have a lot of authors and apps working with the rule data, I think schema-based validation at some level is going to be a must if you don't want to end in sorrow.
phlakaton commented on XML is a cheap DSL   unplannedobsolescence.com... · Posted by u/y1n0
moron4hire · 12 hours ago
I consider CSV to be a signal of an unserious organization. The kind of place that uses thousand line Excel files with VBA macros instead of just buying a real CRM already. The kind of place that thinks junior developers are cheaper than senior developers. The kind of place where the managers brow beat you into working overtime by arguing from a single personal perspective that "this is just how business is done, son."

People will blithely parrot, "it's a poor Workman who blames his tools." But I think the saying, as I've always heard it used to suggest that someone who is complaining is a just bad at their job, is a backwards sentiment. Experts in their respective fields do not complain about their tools not because they are internalizing failure as their own fault. They don't complain because they insist on only using the best tools and thus have nothing to complain about.

phlakaton · 10 hours ago
LOL, I chose a Google Sheet and CSV for my current project, and I'm very serious about it. It's a short-term solution, and it fits my needs perfectly.
phlakaton commented on XML is a cheap DSL   unplannedobsolescence.com... · Posted by u/y1n0
conartist6 · 13 hours ago
Just gonna drop this here : ) https://docs.bablr.org/guides/cstml

CSTML is my attempt to fix all these issues with XML and revive the idea of HTML as a specific subset of a general data language.

As you mention one of the major learnings from the success of JSON was to keep the syntax stupid-simple -- easy to parse, easy to handle. Namespaces were probably the feature to get the most rework.

In theory it could also revive the ability we had with XHTML/XSLT to describe a document in a minimal, fully-semantic DSL, only generating the HTML tag structure as needed for presentation.

phlakaton · 10 hours ago
I unfortunately disagree that your syntax is "stupid-simple." But it highlights an impedance mismatch between XML users and JSON users.

JSON treats text as one of several equally-supported datatypes, and quotes all strings. Great if your data is heavily structured, and text is short and mixed with other types of data. Awful if your data is text.

XML and other SGML apps put the text first and foremost. Anything that's not text needs to be tagged, maybe with an attribute to indicate the intended type. It's annoying to express lots of structured, short-valued data. But it's simple and easy for text markup where the text predominates.

CSTML at first glance seems to fall into the JSON camp. Quoting every string literal makes plenty of sense in JSON, but not in the HTML/text-markup world you seem to want to play in.

phlakaton commented on XML is a cheap DSL   unplannedobsolescence.com... · Posted by u/y1n0
conartist6 · 11 hours ago
A lot of people dislike that decision not to include comments in JSON, but I think while shocking it was and is totally correct.

In a programming language it's usually free to have comments because the comment is erased before the program runs; we usually render comments in grey text because they can't change the meaning of the program.

In a data language you have no such luxury. In a data language there's no comment erasure happening between the producer and the consumer, so comments are just dangerous as they would without doubt evolve into a system of annotations -- an additional layer of communication which would then not be standardized at all and which then would grow into a wild west of nonstandard features and compatibility workarounds.

phlakaton · 10 hours ago
I don't dislike the decision at all, FWIW! For data interchange it's totally reasonable. But it does make JSON ill-suited for a bunch of applications that JSON has been forcefully and unfortunately applied to.
phlakaton commented on XML is a cheap DSL   unplannedobsolescence.com... · Posted by u/y1n0
1a527dd5 · 12 hours ago
The trouble with XML has never been XML itself.

It was also about how easy it was to generate great XML.

Because it is complicated and everyone doesn't really agree on how to properly representative an idea or concept, you have to deal with varying output between producers.

I personally love well formed XML, but the std dev is huge.

Things like JSON have a much more tighter std dev.

The best XML I've seen is generated by hashdeep/md5deep. That's how XML should be.

Financial institutions are basically run on XML, but we do a tonne of work with them and my god their "XML" makes you pray and weep for a swift end.

phlakaton · 12 hours ago
Maybe rather: how easy it was to generate rotten XML. I feel you there.

The XML community, though, embraced the problem of different outputs between different producers, and assumed you'd want to enable interoperability in a Web-sized community where strict patterns to XML were infeasible. Hence all the work on namespaces, validation, transformation, search, and the Semantic Web, so that you could still get stuff done even when communities couldn't agree on their output.

u/phlakaton

KarmaCake day1062September 15, 2014View Original