Readit News logoReadit News
JodieBenitez · 2 years ago
Lots of comments here about XML vs. JSON... but there are areas where these two don't collide. I'm thinking about text/document encoding (real annotated text, things like books, etc).

Even though XML is still king here (see TEI and other norms), some of its limitations are a problem. Consider the following text:

    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Now say you want to qualify a part of it:

    Lorem ipsum <sometag>dolor sit amet</sometag>, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Now say you want to qualify another part, but it's overlapping with previous part:

    Lorem ipsum <sometag>dolor sit <someothertag>amet</sometag>, consectetur</someothertag> adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Of course, this is illegal XML... so we have to do dirty hacks like this:

    Lorem ipsum <start someid="part1"/>dolor sit <start someid="part2"/>amet<end someid="part1"/>, consectetur<end someid="part2"> adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Which means rather inefficient queries afterwards :-/

meepmorp · 2 years ago
A strategy I've seen for dealing the inability of XML to handle overlapping tags, is to treat the tagging as an annotation layer on top of the node with the data:

  <doc>
  <data type="text">
    This is some sample text.
  </data>
  <annotations>
    <tag1 start="1" end="3" comment="foo"/>
    <tag2 start="2" end="4" type="bar" />
  </annotations>
  </doc>
The start and end are usually byte offsets from the start of the text content in the data node. It still sucks, but at least you could apply the same general stragegy to more than just text data - I've seen it used with audio/video where the offsets are treated as time offsets into the media.

JodieBenitez · 2 years ago
Good idea. You would have to edit your annotation layer in case the text data changes though.
yencabulator · 2 years ago
Now you've lost human editability/readability and could just as well encode that in a non-XML format.
euroderf · 2 years ago
It seems klutzy and yet fully in the spirit of XML.
j-pb · 2 years ago
I would argue that the inline way of annotating things in XML is actually ok-ish if one absolutely needs human edit-ability, but otherwise bad design.

  {text: "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.",
   annotations: [{tag: "sometag", ranges: [{from: 12, to: 26}]},
                 {tag: "sometothertag", ranges: [{from: 21, to: 39}]}
Note that this also removes the limitation that annotations have to be consecutive.

mdaniel · 2 years ago
crafty, but for your consideration: that places the burden upon every library author to be "accounting accurate" to any edits, and the only way anyone would know that it's not correct is to visually inspect the output text

also, as I get older I have a deeper and deeper appreciation that "offset" and "text" are words that are fraught with peril

samwillis · 2 years ago
You are absolutely right that XML is better for document structures.

My current theory is that Yjs [0] is the new JSON+XML. It gives you both JSON and XML types in one nested structure, all with conflict free merging via incremental updates.

Also, you note the issue with XML and overlapping inline markup. Yjs has an answer for that with its text type, you can apply attributes (for styling or anything else) via arbatary ranges. They can overlap.

Obviously I'm being a little hypabolic suggesting it will replace JSON, the beauty of JSON is is simplicity, but for many systems building on Yjs or similar CRDT based serialisation systems is the future.

Maybe what we need is a YjsSchema...

https://github.com/yjs/yjs/

thejohnconway · 2 years ago
Yjs isn’t a document structure is it? It seems to be a library for collaborative editing, but I’m not seeing something suitable for marking up a document, or am I missing something obvious?

Deleted Comment

dwaite · 2 years ago
This is actually one of the things processing instructions are useful for - but you would need to define the data within the PI, since they don't have attributes.
thomasfromcdnjs · 2 years ago
JSON Resume uses a defined schema. (listed on schemastore.org)

It has made writing resumes with co pilot super powerful.

nbbaier · 2 years ago
Do you have an example of how you've done this?
sleepytree · 2 years ago
Can you share your general process for that? Trying to do more AI for this type of thing.
ChrisArchitect · 2 years ago
(2020)?

Some previous discussion: https://news.ycombinator.com/item?id=23988269

seanp2k2 · 2 years ago
JSON is the version of XML we deserve.
devjab · 2 years ago
Nobody deserves XML! In all seriousness I get the idea behind XML and I have used a couple of SOAP services which were absolutely brilliant, but as someone who has spent a decade “linking” data from various sources in non-tech enterprise… Well… let’s just say that I’m being kind if I say that 5% of the services which used XML were doing it in a way that was nice to work with.

Which is why JSON’s simplicity is such a win for our industry. Because it’s always easy to handle. Sure you can build it pretty terrible, but you’re not going to do this: <y value=x> and then later do <y>x</y> which I’m not convinced you didn’t do in XML because you’re chaotic evil. And you’re not going to run into an issue where some Adobe Lifecycle schema doesn’t work with .Net because of reasons I never really found out because why wouldn’t an XML schema work in .Net? Anyway, I’m sure the intentions behind XML were quite brilliant but in my anecdotal experience it just doesn’t work for the Thursday afternoon programmer mindset and JSON does.

bryanrasmussen · 2 years ago
>In all seriousness I get the idea behind XML

followed by

>and I have used a couple of SOAP services which were absolutely brilliant

makes me doubt the first part of the statement.

If I were to guess what it means is you understand the point of SOAP, and also understand the limitations and problems especially as it relates to uses of XSD in SOAP and the various stack of Web Service specs, but you probably have not had much experience with non-XSD based validation of XML, you do not have any experience with document formats as opposed to data formats, you probably are not familiar with larger international standards like UBL, and not familiar with XML formats that are not so much data or document oriented - SVG, XSL-FO (which admittedly sucks more than is reasonable), GraphML and so forth...

A lot of the commenters here are standing up for the value of XML, and I'm not actually with this comment, there are a lot of benefits for using JSON especially when you are using JavaScript all over the place. But saying XML sucks because XSD and SOAP sucks indicates a potential lack of knowledge about the whole subject (perhaps only caused by infelicitous phrasing)

crabbone · 2 years ago
A catchy but a meaningless phrase. JSON is a dumpster on fire. Probably in even more ways than XML is. Maybe you deserve it... I feel like I'm being punished by the stupid people who make me use it in a way similar to the sham court hearings from The Planet of Apes.
VoodooJuJu · 2 years ago
>I feel like I'm being punished by the stupid people who make me use it

What's the use-case and what alternative would you prefer?

andyjohnson0 · 2 years ago
Whenever two or more are gathered together, they shall argue about JSON vs XML.

Personally I like the simplicity of JSON and also the expressive power of XML. But then I tend to only use each for the task it was primarily intended: application data-on-the-wire in JSON and "documents" in XML. It seems like a lot of the recurrent discussion around these technologies happens when they're pushed to do things outside their comfort zone. And I wonder if some of this is down to siloing of developer knowledge.

There was a comment on HN a few days ago (not by me, and I can't find it now) to the effect that web development has historically attracted self-taught developers or those who have come to it by routes like bootcamps. It went on to say that they perhaps consequently lack some knowledge of existing techniques and solutions, and therefore tend to recreate solutions that may already exist (and not always well). And this drives the well-known churn in webdev tech: of which bolting schemas onto JSON is arguably an example.

I wonder what people think of this? Personally I think it has some merit, but that the "churn" has also generated (along with much wheel-reinvention) some great innovations. And I say that as someone who works mainly on back-end stuff.

Thoughts?

vonwoodson · 2 years ago
I'd extend this "X developers are mostly self-taught" onto all of computer development. They say, "Every developer Of a Certain Age's first programming language was BASIC" and my experience of (eventually) getting a CS degree is that there is the expectation of students to already know how to do the thing that they are trying to teach; a certain level of "self taught" is expected. To that end, I can see how in The Age of Teh Internets that the standard of self taught has moved of from BASIC to HTML/CSS/JS (or Unity or whatever sparked the young mind's attention). --- What I'm not certain of is that "self taught" means that work will be duplicated because the self taught developer doesn't know the technology that exists. I think that someone who is extremely online will very likely be more abreast of what technologies exist. I think that a formal education is better at establishing what the fundamentals underlying a programing method or paradigm... but not necessarily at exposing new programmers to what the state-of-the-art is.
guideamigo · 2 years ago
I wish it had comments. And for that reason, I prefer yaml.
WirelessGigabit · 2 years ago
I prefer JSON's strictness. A Boolean cannot be confused for a string.

In yaml:

    country: no
Now your country is Boolean(false)

Now, I still prefer yaml overall.

Also, I hate that GitHub actions don't support anchors.

kondro · 2 years ago
There’s always JSON5
hackerbrother · 2 years ago
YAML is a fine implementation of JSON with comments.
jhoechtl · 2 years ago
IS there an on-premise alternative? Not necessarily speaking of schemastore.org on prem but a service comparable in spirit.
unilynx · 2 years ago
The 'service' is basically hosting this file: https://www.schemastore.org/api/json/catalog.json - you could host that locally and point your software to it, modifying the other URLs where needed

It's a pity the catalog format doesn't support an 'import' or relative URLs for schemas - would have made local extensions a bit easier.

relequestual · 2 years ago
We (JSON Schema) did a case study/interview with the guy behind it https://www.youtube.com/watch?v=-yYTxLZZk58&list=PLHVhS4Tj1Y...
osigurdson · 2 years ago
It is interesting that people love json (now with schema), but hate XML while loving HTML at the same time. It is all pretty boring and largely the same imo.
zdragnar · 2 years ago
The absolute worst bit of XML is the confused implementations. What should be an attribute on a tag, and what should go between tags? Even worse, nothing is sanely typed without an xsd. Different systems will treat the following differently:

    <some>true</some>
versus

    <some>1</some>
Some systems require the token "true", others will only treat 1 as the boolean true.

For example, MS claims that for exchange ASD boolean values must be integer 1 or 0 [0], but then links to a W3C spec that allows for the tokens true and false [1]

At least with JSON and HTML, you don't need a separate definition file for basic, primitive data types.

[0] https://learn.microsoft.com/en-us/openspecs/exchange_server_...

[1] https://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#boolean

pbourke · 2 years ago
XML is only concerned with whether a document is well-formed, not its conformity to a given schema. Schemas like XSD, DTD, etc can be plugged in later. Many systems just have an ad hoc schema.

> At least with JSON and HTML, you don't need a separate definition file for basic, primitive data types.

Unless I’m missing your meaning, this seems like an apples-to-oranges comparison. HTML is not a general-purpose format like JSON. It’s a very complicated document format that is validated with reference to an external spec.

I think XML is a great fit for a document format that can become arbitrarily complex yet still easy to author and validate. It’s obviously a really poor fit for a wire transport protocol.

berkes · 2 years ago
How is your example any better in JSON? { some: true, someOther: 1, another: "true" }
dwaite · 2 years ago
> The absolute worst bit of XML is the confused implementations. What should be an attribute on a tag, and what should go between tags?

XML is a language for marking up text. SVG uses attributes for all vector data, because the vector points are not meant to be presented to a user as raw data.

If I embed a SVG into a XHTML document and the browser does not understand SVG, the text within the graphic is still presented to the user.

> Even worse, nothing is sanely typed without an xsd. Different systems will treat the following differently:

This is not a responsibility of XML, which deals in a common well-formed markup format for various document format.

It sounds like you are dealing with a tool that has defined an XML-based data interchange format, and that they may have inconsistent tooling for their format.

peoplefromibiza · 2 years ago
> you don't need a separate definition file for basic, primitive data types.

unless you need something different from JavaScript primitive data types.

For example integers.

Or null means nothing to you.

Or you want a faithful representation of input

   Welcome to Node.js v20.5.1.
   Type ".help" for more information.
   > JSON.stringify(undefined)
   undefined

   > JSON.stringify([undefined])
   '[null]'
but then

   jq "." <<< "[null]"     
   [
     null
   ]
   
   jq "." <<< "undefined"
   parse error: Invalid numeric literal at line 2, column 0

osigurdson · 2 years ago
>> What should be an attribute on a tag, and what should go between tags?

Are you ok with <a href="..">link</a>?

That was kind of my original point, people are fine with html but don't like XML. I think the real reason people don't like XML is it reminds them of Steve Ballmer.

euroderf · 2 years ago
> What should be an attribute on a tag, and what should go between tags?

I think a good rule of thumb is that attributes are for key/value pairs that are probably not user-visible and definitely not directly user-editable.

Carried to a logical conclusion, this would simplify the auto-creation of form GUIs.

slaymaker1907 · 2 years ago
JSON is a much better serialization format since XML was designed as a document format. For example, there is no standardized way to serialize a string with a null character even if you escape it (this is allowed in many programming languages). JSON just says do “\0” and calls it a day. I’m not sure if it’s better for users, but it’s certainly easier to work with as a dev.

HTML isn’t trying to serialize abstract data and is doing what XML does best in being a document/GUI format. It doesn’t matter all that much that it can’t represent null characters in a standard way because it isn’t a printable character.

peoplefromibiza · 2 years ago
> JSON just says do “\0” and calls it a day

nope

   jq "." <<< "\0"
   parse error: Invalid numeric literal at line 2, column 0

   jq "." <<< '{"name": "\0"}'
   parse error: Invalid escape at line 1, column 13
maybe you mean null, which has a lot of different issues though.

   jq "." <<< "null"          
   null

   jq "." <<< '{"name": null}'
   {
     "name": null
   }

crabbone · 2 years ago
Try using UTF-8 encoding for XML, and your problems with zero byte encoding will go away.

Your understanding of "easier" is oversimplified to the point that it's wrong. It's easier to do the wrong thing in JSON, it's harder to do the right thing in JSON (compared to XML).

JSON is a poorly thought-out format. It's problems become progressively more difficult to deal with the more you expect of your program.

fauigerzigerk · 2 years ago
JSON is far simpler because it has no namespaces and no entities.

But I think complexity is always 90% culture. It's pretty arbitrary what kind of culture grows around a particular technology.

crabbone · 2 years ago
There are many ways in which something can be simple. I believe that the most relevant metric for simplicity of something like JSON isn't the number of language elements it has (this would mean that, eg. Brainfuck is simpler than JavaScript), but the amount of work necessary to produce a correct program. JSON is an endless pit of various degrees of difficulties when it comes to writing real-world programs. It's far from simple in that later case.

I.e. learning about namespaces would take a programmer couple of hours, including a foosball match and a coffee break, but working around JSONs bad decisions when it comes to number serialization or sequence serialization will probably take days in the best case, with a side-effect that this work will most likely have to be done on an existing product after a customer complained about corrupting or losing their data...

jillesvangurp · 2 years ago
Neither did XML originally. XML schema was sort of bolted on via some conventions of defining a schema in the root element. The XML 1.0 spec doesn't mention those. XML Schema is a separate standard that came later. Likewise namespaces are a separate specification as well and not part of the XML specification.

The XML specification does have Document Type Definitions (DTD), which were sort of inherited from SGML. This is an optional declaration with its own syntax that defines a DTD. I don't think they were that widely used. XMl Schema started out as an attempt to redefine those in XML.

The nice thing with XML Schema was that you could usually ignore them and just use them as documentation of stuff that you might find in a document. Typically, schema urls wouldn't even resolve and throw a 404 instead. More often than not actually. My go-to tool was xpath in those days. Just ignore the schema and cherry pick what comes back using xpath. Usually not that hard.

The culture around Json is that it emerged out of dynamic language communities (Javascript, Ruby, Python, etc.) with a long tradition of not annotating things with types and a natural aversion against using schemas when they are not needed. Also, they had the benefit of hindsight and weren't looking to rebuild the web services specs on top of json but were actively trying to get away from that.

HdS84 · 2 years ago
Yeah, culture is a big one. See dotnet vs java. The latter picked up many of c# feature over the years but is still much more verbose, e.g. because their developers still abhor var
geysersam · 2 years ago
It's very easy to understand why people prefer JSON. 95% of developers know exactly what JSON is without ever having read anything technical about it. It's obvious.

XML on the other hand... Who here can say they actually know anything substantial about XML besides the syntax? My guess is <10%.

Deleted Comment

HdS84 · 2 years ago
XML suffers from too many options and useless bells and whistles. E.g. the attribute vs Parameter topic is a source of confusion, without adding much value, especially if the source and target are object oriented and/ or a relational db. What's the point?

Then there are namespaces, sure there are probably lots of places where you need to use them. But I never encountered a place where they are really needed, but because they are the default you need to work with them or your queries do not work. Super confusing for beginners and annoying as heck.

crabbone · 2 years ago
Why is "how hard it is for beginners to understand a concept without reading a reference" a useful metric for measuring anything? So what if it's hard? -- Spend an hour with the reference document, and your problems will go away.

In the days when XML was popular I've been more active in several Web forums that helped novice users with particular technology (and that included XML). Not a single confusion about XML namespaces came from someone who read the reference. Quoting the reference would be also a very efficient way to clear the confusion.

Bottom line: it's not a problem worth mentioning. In the grand scheme of things an hour you'd have to spend reading the specification is a drop in a bucket compared to all the time you'd have to work with XML. It's a fixed-size effort that you have make once. Compare this to having to deal with bad "number" serialization that you have to deal in JSON every time in a new program that deals with JSON.

Devasta · 2 years ago
I manage a team of reporting analysts who look at XSLT transforms all day. None of them have programming backgrounds and they have never found XML namespaces to be a problem.
throwawaymaths · 2 years ago
Which is better xml design for a pure data payload (not textual content)?

    <foo>something</foo>
Or

    <foo value="something"/>

When you get back with a coherent universal argument, we'll revisit the json vs xml question.

Devasta · 2 years ago
If these are the sort of questions that are tripping you up, just use <foo>something</foo> for everything.
lenkite · 2 years ago
Isn't this very easy ? Short, Succinct and a Simple No-frills string ? Put it in the attribute. Big, Long and Arbitrary Length Data ? Put it in the content of the element. Now gimme money.
revskill · 2 years ago
Look, with JSX you can put anything as props.

JSON is just pure data.

neverrroot · 2 years ago
A repository of over 700 JSON schemas for various file types. Quite useful.
lolive · 2 years ago
Oh my good. This Semantic Web stuff is going live !