WhatWG: Proposal – Update XPath to at least v2.0

lucideer · 5 years ago

Most impactful part of this imo is:

> Chrome is not interested in this. The XML parts of our pipeline are in maintenance mode and we would love to eventually deprecate and remove them

> -- https://github.com/whatwg/dom/issues/903#issuecomment-707748...

Given Chrome's status as the new IE6 in terms of market share and outsized influence over the technological direction of the web, there's a real risk of moves like this being unilateral.

On the other hand, the last two comments re: libxml being their primary concern do give some hope.

pete_b · 5 years ago

Agreed. I find it rather alarming that an influential member of the Chrome team is talking like this. All our e2e test suites are built around Chrome and XPath because of its expanded abilities over CSS

sergeykish · 5 years ago

Could you please share your use case? Lets drive adoption.

Spivak · 5 years ago

I really don't understand this opinion. Like, I mean, I get the frustration from the perspective of a web developer that wants to use a particular feature but are different browsers not allowed to be different?

Like if Firefox didn't want to implement WebUSB, Safari didn't want to implement WebPush, or Lynx didn't want to implement Canvas is that outrageous?

lucideer · 5 years ago

> are different browsers not allowed to be different?

The web (and open standardisation in general) has pioneered an ecosystem where the primary differentiation between browser is in user-facing UX & features (and ancillary factors such as performance, etc.), rather than developer-facing web-tech support.

This is quite different to a lot of other commercial "competitive" spaces as it substitutes vendor lock-in on patents & trade-secrets for actual innovation in the user-facing space. It's not all rosy: competing browsers still stray from this on the regular, but the ideal is one of the primary selling points of the web as a platform.

Browsers differentiating themselves on user features while maintaining cross-competitor consistency on web standards is the dream that differentiates the web, so seeing its erosion is something to call out.

> Like if Firefox didn't want to implement WebUSB, Safari didn't want to implement WebPush, or Lynx didn't want to implement Canvas is that outrageous?

What's particularly different here is that this isn't about the addition of a feature. The ticket opened is about adding XPath2 support but the quoted line is about removing existing XML support.

This may sound a bit like I'm supporting Microsoft's old "don't break the web" adage, but the big difference here is MS was reluctant to remove features competitors didn't have for fear of breaking IE-only websites (that had relied on them due to IE's dominance). This is about Chrome removing standardised features that browsers, servers, and applications of all varieties have supported interoperably for decades.

gnagatomo · 5 years ago

If I remember correctly, Mozilla didn't want to support video DRM but ended up adding it to Firefox[0] in fear of losing marketshare because Netflix required DRM video playback[1].

Today's browsers are just trying to keep up with whatever Chrome decides to adopt.

[0]: https://blog.mozilla.org/blog/2015/05/12/update-on-digital-r... [1]: https://www.engadget.com/2014-05-14-mozilla-bends-on-drm.htm...

barumi · 5 years ago

> (...) but are different browsers not allowed to be different?

Allowed? Sure, why not?

Desirable? Hell, no.

Think about it. If you are developing a web application and you need ensure it runs on all supported platforms then you either:

a) use standard APIs that are provided by all platforms,

b) use platform-specific APIs and watch the number of platform-specific tests to grow exponentially along with development and maintenance effort,

c) drop platforms.

Suffice to say, option a) is far more desirable.

toyg · 5 years ago

Chrome is not "a different browser", it's the dominant browser. Google worked hard to achieve this state of things, and they now have a clear responsibility in terms of steering web standards. With great market share comes great responsibility.

dmitriid · 5 years ago

> it does seem that about 1-2% of page views end up using XPath

And Chrome is not interested.

And yet, when they release a standard all other browsers object to, they justify that... because it's used by 0.8% page loads (exclusively on Google properties, implemented exclusively by Google devs) [1]

And yet, when other browsers consider standards harmful, [2] Chrome just ships them [3]

[1] https://twitter.com/justboriss/status/1220428902071447552

[2] https://mozilla.github.io/standards-positions/

[3] https://www.chromestatus.com/features

spankalee · 5 years ago

You continue to grind your axe, but they only thing I was saying there was that backwards should be considered when potentially changing an API because the feature is used in the wild.

But sure, try to weaponize anything related to web components at every opportunity.

bryanrasmussen · 5 years ago

I've decided to put this here rather than the WhatWG proposal as focusing on the Chrome statement overly much starts to see a derail.

When a Chrome representative says " The XML parts of our pipeline are in maintenance mode and we would love to eventually deprecate and remove them" I am not particularly surprised, if I were them I think I would like to get rid of these technologies too, based just on my feeling for how much they are used any more (assuming Chrome has stats and these stats bear our my feeling of low usage).

This of course also makes me sad in that I have quite a bit of experience with these technologies and their remove establishes their irrelevance in the present day ( or argues strongly for their increasing irrelevance, too strong a word choice might invite complaint). Believe me I would love if XPath was improved because maybe I might see ads for developers with advanced XPath again and I could increase my rates.

However I don't think it has ever been stated before that this is what Chrome would like. As such I think it falls under the rubric "Things everybody knows but nobody says", which generally nobody says these things because nobody wants to go through the onerous work of dealing with the implications. But as it has been said I start to wonder what those implications are/would be.

SVG has already been mentioned in the linked thread, I don't know if that is actually a problem because I don't know if SVG has been implemented using libxml in Chrome, I could totally see a point not to implement SVG with that - but then again I could see that if you have libxml and you need to implement the SVG DOM, maybe you use libxml to do it - so does getting rid of libxml impact SVG in Chrome?

Obviously applications working with RSS and other associated feed formats in the browser would probably stop to work. Of course people could write applications for these formats on the server, but it certainly seems a setback for RSS.

The same thing applies for RDF, and linked data applications running client side. Not many I know but hey, a nail in the coffin as it were.

MathML - which has never been implemented by Chrome has client side implementations, for example https://pshihn.github.io/math-ml/ I wonder if these would continue working, I would guess probably not.

What about XHTML, is anything there part of the Chrome XML stack - for example DTDs?

I can think of a few other things, but this seems like a reasonable start to think of what the implications would be.

brian_herman__ · 5 years ago

I'd like to eventually remove and deprecate google chrome. But unfortunately it is the 900lb gorilla in the room.

bawolff · 5 years ago

Everyone is mad at the chrome person, but honestly, all they are saying is they don't want the feature bloat of extended support for a super complex standard, that isn't very popular despite existing for 21 years and involves a library they want to deprecate.

Seems like a very reasonable no to me. You don't get good software by saying yes to every feature idea.

shawnz · 5 years ago

If the "no" argument here was what you are saying, that they don't want the feature bloat of XPath, then that would be reasonable. But they are actually arguing that they don't want the feature bloat of XML capability, and XPath doesn't need to have any dependency on XML capability.

So they haven't really addressed the request itself, and they are being extremely dismissive about any suggestion that they might be interpreting things in an unfair light. I think that is why people are frustrated.

I don't even want this feature and I am frustrated just by reading the linked thread.

masklinn · 5 years ago

> XPath doesn't need to have any dependency on XML capability.

That's an important point, programmatically applying XPath to HTML can be super convenient, while basic CSS selectors are superior, XPath is way better for non-trivial selections, and because CSS was designed in a rather ad-hoc manner it "scales" very badly as new features get grafted on.

lucideer · 5 years ago

> a super complex standard

This... is actually subjective.

XML is a relatively simple standard; the complexity is emergent rather than inherent to its definition.

Take for example an oft-cited security issue with xml: xxe. This results from xml entity referencing supporting filesystem access. But there's nothing inherently "complex" about that from a language/syntax definition perspective, filesystem access is just an inherent danger regardless of complexity.

That's not to say XML is as simple as it could be (everything has its caveats and edge-cases: null-default attribute namespaces is a weird one that comes to mind), but in general "strict" and limited language syntaxes tend to be much less complex than lax syntaxes: e.g. HTML or YAML, which have endless depths of gotchas with ambiguous or unintuitive parsing behaviours.

> that isn't very popular

Ha!

sergeykish · 5 years ago

> The XML parts of our pipeline are in maintenance mode and we would love to eventually deprecate and remove them, or ...

It sounds scary. I hope he meant core changes, not API

> ... or at least replace them with something that generates less security bugs. Increasing the capabilities of XML in the browser runs counter to that goal.

> By "XML parts of our pipeline" I mean "everything implemented using libxml and libxslt".

I have one example. Have you known HTML parser is faster than XML [1]? Yes, awfully bloated HTML parser [2] is faster.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1481080

[2] https://html.spec.whatwg.org/multipage/parsing.html

dmitriid · 5 years ago

> Everyone is mad at the chrome person, but honestly, all they are saying

All they are saying is that Chrome team is hypocritical to the extreme. See https://news.ycombinator.com/item?id=24766554

rhacker · 5 years ago

Wasn't the whole point of WHATG to not focus on the browser / specific implementations, but on the standards? I mean it was a ploy to dethrone IE. Just because IE has been dethroned, doesn't mean the purpose of the group should stagnate. I think at this point they feel justified because they defunded Mozilla. The FEDS absolutely need to break up Google.

oefrha · 5 years ago

W3C focused on design by committee, de jure standards. WHATWG focuses on standardizing de facto standards. W3C’s HTML effort failed and now its HTML5 spec redirects to WHATWG’s HTML5 spec. Maybe I misunderstood your comment, but sounds like you got it backwards.

bawolff · 5 years ago

On the contrary, i think the point of whatwg was to focus on reality and make descriptive standards instead of making prescriptive standards like the w3c that nobody implemented.

Pushing xpath (or anything else) despite vendors not wanting it is a step in the opposite direction.

barumi · 5 years ago

What's the point of a standard if no one adopts it?

shock · 5 years ago

> I can tell this is not going to be a productive conversation, as folks are intent on playing word games to try and pretend Chrome has a different stance than we do. As such, I won't be participating in this thread further. I think I've made our position clear. --user domenic (from Google/Chrome, I presume)

So, a productive conversation is one in which people agree with the position of the Chrome team :-/

oefrha · 5 years ago

No, “we don’t support X because Y”, “Y can be interpreted to mean Z and Z does not conflict with X so surely you actually support X” is not a productive conversation.

shawnz · 5 years ago

That's not an accurate summary of the argument. XPath doesn't need to have any dependencies on the technologies they are trying to deprecate, like the Chrome team member is implying. So when they said "Y" they really did mean "Z", and the difference is relevant to the point.

scottfr · 5 years ago

While we are on the topic of XPath improvements, I would love to see a built-in XPath syntax to pierce the shadow DOM of Web components.

It's an important need for the automation and testing use cases. Without it, targeting an element within a web component simply cannot be done solely with a single selector.

spankalee · 5 years ago

Several vendors have objected to that when it's been brought up because it breaks encapsulation.

You can write such a utility in JavaScript in just a few lines though.

sam_lowry_ · 5 years ago

XPath 2.0 is a complex language able to crash you computer or mine bitcoin.

Keep XPath 1.0.

formerly_proven · 5 years ago

I'm probably missing something, but XPath 2.0 doesn't strike me as trivially Turing complete. Loops are bounded (either over range expressions or a set of nodes) and it can't define functions, so it doesn't have recursion, so evaluating any XPath 2.0 expression always halts, so XPath 2.0 can't be Turing complete.

chalst · 5 years ago

You don't need Turing completeness to mine.

phkahler · 5 years ago

>> Keep XPath 1.0.

But let's backport a few features from v2 and call it v1.1 it'll be just like all the OpenGL versions.

masklinn · 5 years ago

The non-garbage parts of XPath 2.0 are in the additional expanded function library, which largely come from exslt.

You don't really need to change the version because the language itself doesn't change.

bryanrasmussen · 5 years ago

I do think having a query language with its own for and if semantics in it would be adding unnecessary complexity to the browser tech stack - but hey - imagine the big bucks I could be pulling in as a consultant if recruiters started having to get guys with 10+ years experience with XPath 2.0 and JavaScript!

Won't someone please think of my financial needs!

admax88q · 5 years ago

It would have made sense to use XPath for CSS selectors, or at least make the CSS selectors a syntax compatible subset if you wanted to them piecewise in functionality like they currently have done.

bryanrasmussen · 5 years ago

Well, I believe that Håkon Wium Lie would never have let that happen, but hey could be misreading his stance on things.

masklinn · 5 years ago

The main reason why it wouldn't have happened is CSS selectors predate XPath by a few years. CSS was first proposed in 1994 and the CSS1 spec was released in 1996, I don't know when XPath was originally proposed but the first public draft was in late 1998 and the release was in 1999.

CSS 2 actually predates XPath 1.0.

XPath would also have needed more work to replace CSS selectors, aside from being a bigger performance concern (through being more capable and not working in a strictly top-down manner, meaning you can easily get very inefficient selectors) it lacks facilities which are quite critical to CSS selectors like the shortcut id and class selectors as well as priority.

In fact talking about class selectors, those are absolute hell to replicate in XPath 1.0 if you don't have extensions to lean on. To replicate the humble `.foo` you need something along the lines of

    //*[contains(concat(' ', normalize-space(@class), ' '), ' foo ')]

And don't miss the spaces around the name of the class you're looking for, they're quite critical. Good fucking luck if you need to combine multiple classes.

exslt/xpath 2.0 have `tokenize` which make it much more convenient although IIRC the way it's used is weird, I think it's

    //*[tokenize(@class) = 'foo']

because "=" on a nodeset is really a containment operation? Not sure. There's also `matches` but that's error-prone because classes tend to be caterpillar-separated, and your friendly neighborhood `\b` will match those so you need to mess around with `(^|\s+)` bullshit instead.

And finally I believe xpath 3.1 has a straightforward "contains-token" which does what the CSS "~=" operator does.

XPath 3.1 was released in 2017. "~=" was part of CSS2 (CSS1 didn't have "arbitrary" attribute selection, only classes and ids).

bawolff · 5 years ago

Well im glad they didn't

CSS is a beautifly intuitive query language. XPath is an ugly non-intuitive syntax. I suspect that's part of the reason that it didn't take off (ux of apis matter)

vbezhenar · 5 years ago

At this point I think that those APIs should be implemented with high-performance JS or Wasm and browsers should provide just enough API entry points to allow for efficient implementation.

chrismorgan · 5 years ago

You could already do this by trawling the DOM and constructing whatever data structures you desire, and updating them by watching for modifications to the DOM with a MutationObserver. But performance would be somewhere between poor and execrable, and memory usage would skyrocket. Note also that this is doing batch updating, so you’ll be querying a potentially out-of-date DOM this way. I state confidently that there will never be a good way of doing this, because it fundamentally requires putting something that is unavoidably slow and memory-heavy onto the critical path (especially if you want it to operate synchronously rather than in asynchronous batch mode). No efficient implementation will ever be possible, given the design of the web.

coding123 · 5 years ago

The chrome guy is being a punk.