Using XPath in 2023 - Readit News

A major pain point of using XPath in isolation (not embedded in XSLT or something else) is those damn namespace bindings. Ugly enough as xmlns: pseudo attributes, in the absence of XML you'd have to use freaking XPointer bindings (as in "xmlns(bla=uri)//bla:xpathexpr/bla:following[@here]"), interpret HTML as XHTML with implicit namespaces, and similar tricks I thought HTML5 left behind for good. Bit surprised htmx falls into the XML nostalgia trap.

kjhughes · 3 years ago

A common source of frustration is that XPath itself doesn't provide a means of binding a namespace prefix with a namespace. It relies upon mechanisms provided by the hosting language or library. After helping scores of devs with such problems on Stack Overflow, I took time to write a canonical answer addressing this issue:

How does XPath deal with XML namespaces?

https://stackoverflow.com/a/40796315/290085

listenallyall · 3 years ago

> Bit surprised htmx falls into the XML nostalgia trap.

As the article states, the only tag that can't be searched for via CSS selectors is the new "hx-on".

And my initial thought about this is (which I'm open to reconsider), it seems like you probably shouldn't go looking for elements based on their event model. Show me every <div> with a certain value in its class attribute? Yes of course. But find me every <div> with a certain behavior on mouseUp? This isn't something you'd do in an OO language, you'd look for objects with a certain property value, not a specific event handler, which would likely be private anyway.

I'd also note that, for the same reason, standard HTML as well as React, etc have the same inability as HTMX here. The article notes that with HTMX you can't use CSS selectors to find "hx-on(something)" tags, the same handicap prevents you from searching for any element with an "on(something)" attribute, i.e. onClick, onMouseUp, onKeyPress etc.

Maybe I haven't thought through this entirely but I don't see any problem here.

(edit: to be clear, of course you can CSS-select a specific tag like onClick, and you can also select a specific HTMX tag like hx-on:click. What the article notes that you cannot do is search via wildcards or "starts-with" in CSS, i.e. hx-on:*, but you can't do on* either)

arethuza · 3 years ago

Namespaces always seemed to be the ugliest part of the XML ecosystem - I know why they were required, but they always seemed to be horrible to work with.

_y8kz · 3 years ago

The worst thing is not the existence of namespaces, but that are are relatively often implemented incorrectly.

As an example of things that sometimes break:

* namespace redefinition: you have xmlns:ns1 on a parent node and xmlns:ns1 with different value on a child node.

* same namespace on sibling nodes (happens often with streaming): multiple child nodes have a xmls:ns1 with some value. That is only valid for that node.

* some processors expect namespaces only on a specific node (I think it was MS Navision ~ decade ago that failed when declaration wasn't on a specific node, but on a child node)

yrro · 3 years ago

I just can't fathom why they chose to use URLs--with a protocol--to define namespaces? By all means use the DNS, but bar.foo.example.com would have been such a better choice than http://example.com/foo/bar!

vbezhenar · 3 years ago

My biggest issue with XML namespaces is how they work with attribute names. I worked with XML enough but I have no idea how it really works. Every time I had to Google it or trial it.

My opinion is that XML would be much better without attributes. A lot of things would be simpler. Attributes are nice, but they add whole another dimension of problems everywhere.

marcosdumay · 3 years ago

> I know why they were required

Have you ever seen tag name clashing in a XML document?

All those devs who skipped xpath/xquery/xslt from the bad old days of xml are going to get a panic attack.

MilStdJunkie · 3 years ago

Xquery, for me, was and remains a core tool for dealing with XML specifications of surreal complexity that verge on madness. BaseX is the "Microsoft Access" xquery application, while eXist is sort of like a full framework, with package management, deployment, and that sort of thing. Other query languages might be more cutting edge, but they either 1) have a lot of stuff I don't need, or, more likely, 2) require a more permissive InfoSec setup than I am typically allowed. "Docker and any other form of virtualization are not permitted on ANY company network regardless of circumstances". Well, ok then.

Generally the next stop after xquery, for me, is text mining, either on R+Python or on Orange ML. If a miner doesn't cut it, then LLM shenanigans.

Also, xpath? It's pretty great. XQuery? Does the job. XSLT? Ok, so NOW that's the feeling of a panic attack. I've been doing XSLT for literal decades, and I still don't know what I'm doing when wrenching on a giant pile of FOP generating funhouse madness. When I am tagged into a data transformation job, I always stress that xquery is the right tool, rather than a confounding nested directory of XSLT using different parsers and different passes like a figure-8 interstate off-ramp. For FOP, though, there's really just one game in town for that. Although, having said that, me and a bunch of others are doing our damndest to show that what you're trying to do with XSLT/FO can be done way way way easier with CSS and Paged Media (either via Paged.js or Vivliostyle or any of the other zillion PMM implementations). The downside is you have to wrench some CSS yourself, but honestly, that's probably going to be easier than wrenching on DocBook-XSL or the DITA-OT or one of the MIL-STD XSL packages.

mcswell · 3 years ago

Glad to hear that someone else thinks of XSLT the way I do. I had to write some to deal with converting DocBook XML to LaTeX (building on dblatex, but adding some specializations), and besides being verbose (as another commenter here says), I found it virtually impossible to debug. I'd much rather write in Prolog.

cryptonector · 3 years ago

The problem with XSLT is how incredibly verbose it is, but maybe that's just the problem with XML. jq is to JSON as XSLT/XPath is to XML, which shows you can have pithiness.

jerf · 3 years ago

"XSLT? Ok, so NOW that's the feeling of a panic attack."

Right, so here's the secret decoder ring of XSLT: Underneath all the complexity, it isn't doing ANYTHING you can't do in your language of choice armed with an XPath library. And it is often incapable of doing even some rather simple things you can do in your language of choice armed with an XPath library.

XSLT is just a terrible programming language. That's all it is. All of the magic is in the XPath part; once you've selected the nodes you're working with, XSLT is a horrifyingly awful way of manipulating them into doing what you want them to do.

XSLT is the intersection of the worst parts of declarative programming with the worst parts of functional programming, wrapped up in one of the worst ways of serializing a programming language. What confuses some people even to this day is that they see "declarative programming" and "functional" and even "standardized serialization" and accidentally credit XSLT with the benefits of such approaches, and then if XSLT doesn't work they blame themselves for failing declarative functional programming in such a wonderful serialization format. They're wrong. It's XSLT failing them, whose origin is also people who thought if they just create something declarative and functional and serialized through XML they were guaranteed to be producing something good because those things are just so Platonically good on their own that they couldn't possibly produce something useless and broken, so it was not necessary to analyze the resulting abomination to see whether it actually fulfilled the goals, because it simply by definition fulfilled the goals by virtue of being declarative and functional and in the bestest serialization ever.

Perhaps there is a declarative, functional XSLT-inspired language that could be written that would be as good as the people bedazzled by the buzzwords think XSLT is. (Though there's no world where such a language is helped by serializing into XML; serializing a language for manipulating XML into XML is actually the worst choice possible because of the nested encoding you inevitably produce!) However, in the meantime, you don't really need to wait around for someone to produce it because it turns your favorite general purpose language equipped with an XPath library is already 90%+ of the way there.

ketralnis · 3 years ago

XML had a bad rap and was certainly abused. But the wealth and quality of the tools for working with it is really unmatched even today.

sacado2 · 3 years ago

It's also the only widespread format that can deal with both rich text documents (à la HTML) and complex, structured data (what JSON is good for). It's golden when you need to add tons of complex annotations to a text document.

pwdisswordfishc · 3 years ago

Domenic Denicola (aka the man who ruined promises) probably will as well.

https://github.com/whatwg/dom/issues/67

cstrahan · 3 years ago

That made me chuckle.

For those not familiar with the promise design controversy:

http://brianmckenna.org/blog/category_theory_promisesaplus

https://github.com/promises-aplus/constructor-spec/issues/24

https://github.com/promises-aplus/promises-spec/issues/94

kreetx · 3 years ago

How did he ruin it?

EDIT: Thanks, sibling!

infogulch · 3 years ago

IME their opinions are split between trauma and nostalgia.

smrtinsert · 3 years ago

Xslt and xml apis could do neat things. I didn't mind the era

AbraKdabra · 3 years ago

I had that skip until two years ago when I had to use it to parse the OpenVAS API output. I wish I never had to put a stop to that skip, I hated every second of my life working with XPath.

sedatk · 3 years ago

My personal homepage in the 2000's was a simple XML document, automatically translated with XSLT. IE supported that using <?xml-stylesheet?> pre-processor tag. When you tried to view its source, you'd only see a weird XML markup. Surprised a few I'm sure as this predates Firebug and similar DOM inspectors :)

I actually liked XML + XSLT + XPath combo a lot, and kept using them with my projects at Microsoft.

palsecam · 3 years ago

I still use, to this day, XSLT on my website:

1/ To transform its Atom feed to pretty-looking HTML, if you access it from a browser: https://paul.fragara.com/feed.xml

2/ Similarly, to embed the feed dynamically on the homepage, using JavaScript XSLTProcessor (https://developer.mozilla.org/en-US/docs/Web/API/XSLTProcess...)

This works on all major browsers (and even on IE!) The XSL sheet is here: https://gitlab.com/PaulCapron/paul.fragara.com/-/blob/master...

In another context, I’ve leveraged XSLT 2.0, this time in a build process, to slightly transform (X)HTML pages before publishing (canonicalize URLs, embed JS & CSS directly in the page, etc.): https://github.com/PaulCapron/pwa2uwp/blob/master/postprod.x...

MrOxiMoron · 3 years ago

I did the same, and then allowed logged in users to define their own stylesheet. Thought it was pretty cool at the time.

saurik · 3 years ago

This is still supported in the like of Chrome/Safari. It just sucks as they never upgraded past XSL/T 1.0.

I wouldn't bet on this support to continue forever. There were calls to drop this support.

Thankfully it should be possible to emulate the whole thing with JS snippet, so old websites probably would work with minimal modifications.

thom · 3 years ago

One of the first places I worked ended up creating quite a nice framework where components would ask for data that was all put in a single SHAPE query, and the view layer was just XSLT (which is a perfectly nice functional language once you understand it). Was pretty productive.

spiralx · 3 years ago

I wrote an XSLT stylesheet that would turn any page into a pretty-printed source view of the HTML just by including it via the <?xml-stylesheet?> tag. That was during the years I worked at a publishing company that stored, generated and edited everything via a combination of XML, XSLT, XPath, XInclude and XQuery, and now I'm working somewhere else that generates documents I do miss the flexibility of that combination... unfortunately it's hard to sell anything XML-related to people who've never seen it used effectively.

actionfromafar · 3 years ago

I tried to go that path in 2010-ish but discovered browser support was not great, so I dropped it.

I really liked it.

tannhaeuser · 3 years ago

nickpeterson · 3 years ago

nunez · 3 years ago

I use xpath literally every time I need to look for something on a webpage before resorting to scraping.

Buttons840 · 3 years ago

What / how do you use it? I've only ever used xpath for scraping, and I don't know how it can be useful outside of scraping.

zelphirkalt · 3 years ago

I think you can use it in the browser's inspector.

diarrhea · 3 years ago

End to end testing comes to mind, for UI testing.

charcircuit · 3 years ago

I am the opposite. The only time I use xpath is for scraping.

solardev · 3 years ago

These are still really useful in Jest/Playwright tests when you don't have an easy CSS selector or ID/class to choose from. XPath is super powerful, especially for dynamic pages/apps where the DOM isn't necessarily predictable but the relative positions of items (like a card in a list) are.

cmehdy · 3 years ago

XPath is also good if you think of it in adversarial terms (i.e. QA). I don't care what you div is - I care what text is on the page or whether something is saying what I expect under a title. That's where I got familiar with it (writing automation frameworks for selenium/appium for ff/chrome/ios/android) and it did a great job at it - regardless of drivers' performance. When your eyes parse a webpage, you don't really care whether you're at a div or a paragraph or basically anything else - it's the software's job (the dev's job) to think of it for you. Is it nested inside something with this accessibility marker? What do I start to read first? How many levels deep do I have to look? And so on and so forth.

I'm looking forward to those new things in htmx because it seems like it extends what we know of html. And I can't yet quite say if that is going to stick (by being integrated into IDEs, work with frameworks easily, be the go-to way to do this and that, etc). Time wil tell, but it's cool to see new proposals to improve the status quo out there.

onion2k · 3 years ago

Using xpath in a test on a dynamic and unpredictable DOM is a painful road to flaky, brittle tests. If you're writing tests for a system you have no control over maybe you have no choice, but if you do then I'd recommend changing the output to be something robust and testable.

bryanrasmussen · 3 years ago

>Using xpath in a test on a dynamic and unpredictable DOM is a painful road to flaky, brittle tests.

sure, but more powerful and thus more able to handle situation than depending on css selectors.

Also consider Robula+ https://tsigalko18.github.io/assets/pdf/2016-Leotta-JSEP.pdf

https://tsigalko18.github.io/assets/pdf/2014-Leotta-ISSREW.p...

npteljes · 3 years ago

I love XPath, I'd say that alongside of regular expressions, SQL and Excel functions, they helped in my IT career a ton.

MrBuddyCasino · 3 years ago

You can simply use this in the browser console to get a friendly XPath API:

    $x("//my/selector")

Very handy for testing XPath expressions for Selenium or Playwright tests, if there is no data-testid attribute.

tbeseda · 3 years ago

Interesting! What are the limits of `evaluate()` in browsers? I see it is available in 95% of users' browsers [1], but is it consistent in its implementation of XPath? How does its performance compare to `querySelector[All]`? Might be interesting to see if JS libraries that do a lot of DOM searching could get some perf gains. Maybe they already utilize evaluate?

[1] https://caniuse.com/?search=evaluate

err4nt · 3 years ago

CSS is designed to be very fast, and because of that design choice it's less expressive. XPath is not going to beat CSS in speed, but it allows you to move through the document in any direction (up and down in the DOM tree of elements, forward/back among siblings in the DOM tree, and forward and backward in document appearance order).

We use XPath at work from CSS with custom plugins, and use XPath in JavaScript for targeting elements that otherwise wouldn't be straightforward to select with CSS.

Another cool thing XPath does is have awareness of the text content of elements! //li[contains(.,"example")] would target all <li> elements with text content containing "example".

masklinn · 3 years ago

FWIW CSS now has “:has” which provides a form of general purpose predication, although afaik it still doesn’t have a :contains (that was proposed for CSS3 or something but I don’t think it got accepted).

A huge part of XPath’s power though, and something you AFAIK can’t do in browsers, is extensibility.

For instance selecting an element on the basis of a class is absolute hell in XPath 1.0 (and not great in 2.0 either, XPath 3.1’s `contains-token` finally made that not hell). But server-side you don’t care because pretty much all implementations allow installing your own functions so you can add your own `contains-token` or even `has-class` predicate and be on your way.