My personal homepage in the 2000's was a simple XML document, automatically translated with XSLT. IE supported that using <?xml-stylesheet?> pre-processor tag. When you tried to view its source, you'd only see a weird XML markup. Surprised a few I'm sure as this predates Firebug and similar DOM inspectors :)
I actually liked XML + XSLT + XPath combo a lot, and kept using them with my projects at Microsoft.
In another context, I’ve leveraged XSLT 2.0, this time in a build process, to slightly transform (X)HTML pages before publishing (canonicalize URLs, embed JS & CSS directly in the page, etc.): https://github.com/PaulCapron/pwa2uwp/blob/master/postprod.x...
One of the first places I worked ended up creating quite a nice framework where components would ask for data that was all put in a single SHAPE query, and the view layer was just XSLT (which is a perfectly nice functional language once you understand it). Was pretty productive.
I wrote an XSLT stylesheet that would turn any page into a pretty-printed source view of the HTML just by including it via the <?xml-stylesheet?> tag. That was during the years I worked at a publishing company that stored, generated and edited everything via a combination of XML, XSLT, XPath, XInclude and XQuery, and now I'm working somewhere else that generates documents I do miss the flexibility of that combination... unfortunately it's hard to sell anything XML-related to people who've never seen it used effectively.
A major pain point of using XPath in isolation (not embedded in XSLT or something else) is those damn namespace bindings. Ugly enough as xmlns: pseudo attributes, in the absence of XML you'd have to use freaking XPointer bindings (as in "xmlns(bla=uri)//bla:xpathexpr/bla:following[@here]"), interpret HTML as XHTML with implicit namespaces, and similar tricks I thought HTML5 left behind for good. Bit surprised htmx falls into the XML nostalgia trap.
A common source of frustration is that XPath itself doesn't provide a means of binding a namespace prefix with a namespace. It relies upon mechanisms provided by the hosting language or library. After helping scores of devs with such problems on Stack Overflow, I took time to write a canonical answer addressing this issue:
> Bit surprised htmx falls into the XML nostalgia trap.
As the article states, the only tag that can't be searched for via CSS selectors is the new "hx-on".
And my initial thought about this is (which I'm open to reconsider), it seems like you probably shouldn't go looking for elements based on their event model. Show me every <div> with a certain value in its class attribute? Yes of course. But find me every <div> with a certain behavior on mouseUp? This isn't something you'd do in an OO language, you'd look for objects with a certain property value, not a specific event handler, which would likely be private anyway.
I'd also note that, for the same reason, standard HTML as well as React, etc have the same inability as HTMX here. The article notes that with HTMX you can't use CSS selectors to find "hx-on(something)" tags, the same handicap prevents you from searching for any element with an "on(something)" attribute, i.e. onClick, onMouseUp, onKeyPress etc.
Maybe I haven't thought through this entirely but I don't see any problem here.
(edit: to be clear, of course you can CSS-select a specific tag like onClick, and you can also select a specific HTMX tag like hx-on:click. What the article notes that you cannot do is search via wildcards or "starts-with" in CSS, i.e. hx-on:*, but you can't do on* either)
Namespaces always seemed to be the ugliest part of the XML ecosystem - I know why they were required, but they always seemed to be horrible to work with.
The worst thing is not the existence of namespaces, but that are are relatively often implemented incorrectly.
As an example of things that sometimes break:
* namespace redefinition: you have xmlns:ns1 on a parent node and xmlns:ns1 with different value on a child node.
* same namespace on sibling nodes (happens often with streaming): multiple child nodes have a xmls:ns1 with some value. That is only valid for that node.
* some processors expect namespaces only on a specific node (I think it was MS Navision ~ decade ago that failed when declaration wasn't on a specific node, but on a child node)
I just can't fathom why they chose to use URLs--with a protocol--to define namespaces? By all means use the DNS, but bar.foo.example.com would have been such a better choice than http://example.com/foo/bar!
My biggest issue with XML namespaces is how they work with attribute names. I worked with XML enough but I have no idea how it really works. Every time I had to Google it or trial it.
My opinion is that XML would be much better without attributes. A lot of things would be simpler. Attributes are nice, but they add whole another dimension of problems everywhere.
Xquery, for me, was and remains a core tool for dealing with XML specifications of surreal complexity that verge on madness. BaseX is the "Microsoft Access" xquery application, while eXist is sort of like a full framework, with package management, deployment, and that sort of thing. Other query languages might be more cutting edge, but they either 1) have a lot of stuff I don't need, or, more likely, 2) require a more permissive InfoSec setup than I am typically allowed. "Docker and any other form of virtualization are not permitted on ANY company network regardless of circumstances". Well, ok then.
Generally the next stop after xquery, for me, is text mining, either on R+Python or on Orange ML. If a miner doesn't cut it, then LLM shenanigans.
Also, xpath? It's pretty great. XQuery? Does the job. XSLT? Ok, so NOW that's the feeling of a panic attack. I've been doing XSLT for literal decades, and I still don't know what I'm doing when wrenching on a giant pile of FOP generating funhouse madness. When I am tagged into a data transformation job, I always stress that xquery is the right tool, rather than a confounding nested directory of XSLT using different parsers and different passes like a figure-8 interstate off-ramp. For FOP, though, there's really just one game in town for that. Although, having said that, me and a bunch of others are doing our damndest to show that what you're trying to do with XSLT/FO can be done way way way easier with CSS and Paged Media (either via Paged.js or Vivliostyle or any of the other zillion PMM implementations). The downside is you have to wrench some CSS yourself, but honestly, that's probably going to be easier than wrenching on DocBook-XSL or the DITA-OT or one of the MIL-STD XSL packages.
Glad to hear that someone else thinks of XSLT the way I do. I had to write some to deal with converting DocBook XML to LaTeX (building on dblatex, but adding some specializations), and besides being verbose (as another commenter here says), I found it virtually impossible to debug. I'd much rather write in Prolog.
The problem with XSLT is how incredibly verbose it is, but maybe that's just the problem with XML. jq is to JSON as XSLT/XPath is to XML, which shows you can have pithiness.
"XSLT? Ok, so NOW that's the feeling of a panic attack."
Right, so here's the secret decoder ring of XSLT: Underneath all the complexity, it isn't doing ANYTHING you can't do in your language of choice armed with an XPath library. And it is often incapable of doing even some rather simple things you can do in your language of choice armed with an XPath library.
XSLT is just a terrible programming language. That's all it is. All of the magic is in the XPath part; once you've selected the nodes you're working with, XSLT is a horrifyingly awful way of manipulating them into doing what you want them to do.
XSLT is the intersection of the worst parts of declarative programming with the worst parts of functional programming, wrapped up in one of the worst ways of serializing a programming language. What confuses some people even to this day is that they see "declarative programming" and "functional" and even "standardized serialization" and accidentally credit XSLT with the benefits of such approaches, and then if XSLT doesn't work they blame themselves for failing declarative functional programming in such a wonderful serialization format. They're wrong. It's XSLT failing them, whose origin is also people who thought if they just create something declarative and functional and serialized through XML they were guaranteed to be producing something good because those things are just so Platonically good on their own that they couldn't possibly produce something useless and broken, so it was not necessary to analyze the resulting abomination to see whether it actually fulfilled the goals, because it simply by definition fulfilled the goals by virtue of being declarative and functional and in the bestest serialization ever.
Perhaps there is a declarative, functional XSLT-inspired language that could be written that would be as good as the people bedazzled by the buzzwords think XSLT is. (Though there's no world where such a language is helped by serializing into XML; serializing a language for manipulating XML into XML is actually the worst choice possible because of the nested encoding you inevitably produce!) However, in the meantime, you don't really need to wait around for someone to produce it because it turns your favorite general purpose language equipped with an XPath library is already 90%+ of the way there.
It's also the only widespread format that can deal with both rich text documents (à la HTML) and complex, structured data (what JSON is good for). It's golden when you need to add tons of complex annotations to a text document.
I had that skip until two years ago when I had to use it to parse the OpenVAS API output. I wish I never had to put a stop to that skip, I hated every second of my life working with XPath.
These are still really useful in Jest/Playwright tests when you don't have an easy CSS selector or ID/class to choose from. XPath is super powerful, especially for dynamic pages/apps where the DOM isn't necessarily predictable but the relative positions of items (like a card in a list) are.
XPath is also good if you think of it in adversarial terms (i.e. QA). I don't care what you div is - I care what text is on the page or whether something is saying what I expect under a title. That's where I got familiar with it (writing automation frameworks for selenium/appium for ff/chrome/ios/android) and it did a great job at it - regardless of drivers' performance. When your eyes parse a webpage, you don't really care whether you're at a div or a paragraph or basically anything else - it's the software's job (the dev's job) to think of it for you. Is it nested inside something with this accessibility marker? What do I start to read first? How many levels deep do I have to look? And so on and so forth.
I'm looking forward to those new things in htmx because it seems like it extends what we know of html. And I can't yet quite say if that is going to stick (by being integrated into IDEs, work with frameworks easily, be the go-to way to do this and that, etc). Time wil tell, but it's cool to see new proposals to improve the status quo out there.
Using xpath in a test on a dynamic and unpredictable DOM is a painful road to flaky, brittle tests. If you're writing tests for a system you have no control over maybe you have no choice, but if you do then I'd recommend changing the output to be something robust and testable.
Interesting! What are the limits of `evaluate()` in browsers? I see it is available in 95% of users' browsers [1], but is it consistent in its implementation of XPath?
How does its performance compare to `querySelector[All]`?
Might be interesting to see if JS libraries that do a lot of DOM searching could get some perf gains. Maybe they already utilize evaluate?
CSS is designed to be very fast, and because of that design choice it's less expressive. XPath is not going to beat CSS in speed, but it allows you to move through the document in any direction (up and down in the DOM tree of elements, forward/back among siblings in the DOM tree, and forward and backward in document appearance order).
We use XPath at work from CSS with custom plugins, and use XPath in JavaScript for targeting elements that otherwise wouldn't be straightforward to select with CSS.
Another cool thing XPath does is have awareness of the text content of elements! //li[contains(.,"example")] would target all <li> elements with text content containing "example".
FWIW CSS now has “:has” which provides a form of general purpose predication, although afaik it still doesn’t have a :contains (that was proposed for CSS3 or something but I don’t think it got accepted).
A huge part of XPath’s power though, and something you AFAIK can’t do in browsers, is extensibility.
For instance selecting an element on the basis of a class is absolute hell in XPath 1.0 (and not great in 2.0 either, XPath 3.1’s `contains-token` finally made that not hell). But server-side you don’t care because pretty much all implementations allow installing your own functions so you can add your own `contains-token` or even `has-class` predicate and be on your way.
I actually liked XML + XSLT + XPath combo a lot, and kept using them with my projects at Microsoft.
1/ To transform its Atom feed to pretty-looking HTML, if you access it from a browser: https://paul.fragara.com/feed.xml
2/ Similarly, to embed the feed dynamically on the homepage, using JavaScript XSLTProcessor (https://developer.mozilla.org/en-US/docs/Web/API/XSLTProcess...)
This works on all major browsers (and even on IE!) The XSL sheet is here: https://gitlab.com/PaulCapron/paul.fragara.com/-/blob/master...
In another context, I’ve leveraged XSLT 2.0, this time in a build process, to slightly transform (X)HTML pages before publishing (canonicalize URLs, embed JS & CSS directly in the page, etc.): https://github.com/PaulCapron/pwa2uwp/blob/master/postprod.x...
Thankfully it should be possible to emulate the whole thing with JS snippet, so old websites probably would work with minimal modifications.
I really liked it.
How does XPath deal with XML namespaces?
https://stackoverflow.com/a/40796315/290085
As the article states, the only tag that can't be searched for via CSS selectors is the new "hx-on".
And my initial thought about this is (which I'm open to reconsider), it seems like you probably shouldn't go looking for elements based on their event model. Show me every <div> with a certain value in its class attribute? Yes of course. But find me every <div> with a certain behavior on mouseUp? This isn't something you'd do in an OO language, you'd look for objects with a certain property value, not a specific event handler, which would likely be private anyway.
I'd also note that, for the same reason, standard HTML as well as React, etc have the same inability as HTMX here. The article notes that with HTMX you can't use CSS selectors to find "hx-on(something)" tags, the same handicap prevents you from searching for any element with an "on(something)" attribute, i.e. onClick, onMouseUp, onKeyPress etc.
Maybe I haven't thought through this entirely but I don't see any problem here.
(edit: to be clear, of course you can CSS-select a specific tag like onClick, and you can also select a specific HTMX tag like hx-on:click. What the article notes that you cannot do is search via wildcards or "starts-with" in CSS, i.e. hx-on:*, but you can't do on* either)
As an example of things that sometimes break:
* namespace redefinition: you have xmlns:ns1 on a parent node and xmlns:ns1 with different value on a child node.
* same namespace on sibling nodes (happens often with streaming): multiple child nodes have a xmls:ns1 with some value. That is only valid for that node.
* some processors expect namespaces only on a specific node (I think it was MS Navision ~ decade ago that failed when declaration wasn't on a specific node, but on a child node)
My opinion is that XML would be much better without attributes. A lot of things would be simpler. Attributes are nice, but they add whole another dimension of problems everywhere.
Have you ever seen tag name clashing in a XML document?
Generally the next stop after xquery, for me, is text mining, either on R+Python or on Orange ML. If a miner doesn't cut it, then LLM shenanigans.
Also, xpath? It's pretty great. XQuery? Does the job. XSLT? Ok, so NOW that's the feeling of a panic attack. I've been doing XSLT for literal decades, and I still don't know what I'm doing when wrenching on a giant pile of FOP generating funhouse madness. When I am tagged into a data transformation job, I always stress that xquery is the right tool, rather than a confounding nested directory of XSLT using different parsers and different passes like a figure-8 interstate off-ramp. For FOP, though, there's really just one game in town for that. Although, having said that, me and a bunch of others are doing our damndest to show that what you're trying to do with XSLT/FO can be done way way way easier with CSS and Paged Media (either via Paged.js or Vivliostyle or any of the other zillion PMM implementations). The downside is you have to wrench some CSS yourself, but honestly, that's probably going to be easier than wrenching on DocBook-XSL or the DITA-OT or one of the MIL-STD XSL packages.
Right, so here's the secret decoder ring of XSLT: Underneath all the complexity, it isn't doing ANYTHING you can't do in your language of choice armed with an XPath library. And it is often incapable of doing even some rather simple things you can do in your language of choice armed with an XPath library.
XSLT is just a terrible programming language. That's all it is. All of the magic is in the XPath part; once you've selected the nodes you're working with, XSLT is a horrifyingly awful way of manipulating them into doing what you want them to do.
XSLT is the intersection of the worst parts of declarative programming with the worst parts of functional programming, wrapped up in one of the worst ways of serializing a programming language. What confuses some people even to this day is that they see "declarative programming" and "functional" and even "standardized serialization" and accidentally credit XSLT with the benefits of such approaches, and then if XSLT doesn't work they blame themselves for failing declarative functional programming in such a wonderful serialization format. They're wrong. It's XSLT failing them, whose origin is also people who thought if they just create something declarative and functional and serialized through XML they were guaranteed to be producing something good because those things are just so Platonically good on their own that they couldn't possibly produce something useless and broken, so it was not necessary to analyze the resulting abomination to see whether it actually fulfilled the goals, because it simply by definition fulfilled the goals by virtue of being declarative and functional and in the bestest serialization ever.
Perhaps there is a declarative, functional XSLT-inspired language that could be written that would be as good as the people bedazzled by the buzzwords think XSLT is. (Though there's no world where such a language is helped by serializing into XML; serializing a language for manipulating XML into XML is actually the worst choice possible because of the nested encoding you inevitably produce!) However, in the meantime, you don't really need to wait around for someone to produce it because it turns your favorite general purpose language equipped with an XPath library is already 90%+ of the way there.
https://github.com/whatwg/dom/issues/67
For those not familiar with the promise design controversy:
http://brianmckenna.org/blog/category_theory_promisesaplus
https://github.com/promises-aplus/constructor-spec/issues/24
https://github.com/promises-aplus/promises-spec/issues/94
EDIT: Thanks, sibling!
I'm looking forward to those new things in htmx because it seems like it extends what we know of html. And I can't yet quite say if that is going to stick (by being integrated into IDEs, work with frameworks easily, be the go-to way to do this and that, etc). Time wil tell, but it's cool to see new proposals to improve the status quo out there.
sure, but more powerful and thus more able to handle situation than depending on css selectors.
Also consider Robula+ https://tsigalko18.github.io/assets/pdf/2016-Leotta-JSEP.pdf
https://tsigalko18.github.io/assets/pdf/2014-Leotta-ISSREW.p...
[1] https://caniuse.com/?search=evaluate
We use XPath at work from CSS with custom plugins, and use XPath in JavaScript for targeting elements that otherwise wouldn't be straightforward to select with CSS.
Another cool thing XPath does is have awareness of the text content of elements! //li[contains(.,"example")] would target all <li> elements with text content containing "example".
A huge part of XPath’s power though, and something you AFAIK can’t do in browsers, is extensibility.
For instance selecting an element on the basis of a class is absolute hell in XPath 1.0 (and not great in 2.0 either, XPath 3.1’s `contains-token` finally made that not hell). But server-side you don’t care because pretty much all implementations allow installing your own functions so you can add your own `contains-token` or even `has-class` predicate and be on your way.