Update of the RDF and SPARQL (RDF star) families of specifications

I am extremely glad to see RDF being worked on and improved. It's an extremely powerful, general data model. It gets a bad rap because if its unfortunate association with the XML syntax and the fact that many of the reference implementations were written at the peak of overdesigned OO cruft.

This type of semantically precise yet flexible data model is going to become increasingly important as a bridge between highly structured data in traditional databases, and unstructured information processing using LLMs. GPT does a surprisingly good job of converting between unstructured data and RDF, and my hope is that LLMs can provide some of the key components in building an actual semantic web, which has remained elusive for so long (for many good reasons.)

jszymborski · 2 years ago

> ...unfortunate association with the XML syntax

XML isn't great for every use-case, but it's really become the Nickelback of formats. Let's be real, it's pretty brilliant for some things, and I think RDF is a good example of where it shines.

zozbot234 · 2 years ago

RDF is not dependent on XML as a syntax. There's a text-based syntax in common use (Turtle), as well as a separate one based on JSON (JSON-LD).

mxmilkiib · 2 years ago

RDF primer but using Turtle (older doc):

https://www.w3.org/2007/02/turtle/primer/

Audio (etc) plugin format that uses .ttl:

https://lv2plug.in/

https://gitlab.com/lv2/lv2/-/tree/master/lv2/core.lv2

https://gitlab.com/lv2/lv2/-/tree/master/schemas.lv2

https://drobilla.net/software/sord https://drobilla.net/software/serd

drobilla has mentioned that, if there was a C JSON-LD lib, that might be enough to warrant an LV3

For more: https://github.com/lv2/lv2/wiki

P.S. not many people know that LV2 does modular synth style CV https://linuxmusicians.com/viewtopic.php?t=20701&p=112242

tejtm · 2 years ago

As well as a nigh tabular form in "N-triples" and "N-quads"

https://www.w3.org/TR/n-triples/

https://www.w3.org/TR/n-quads/

which can end up as the easiest formats to work with sometimes.

There are tools like 'rapper' and 'serd' to convert to and from the various formats.

https://librdf.org/raptor/rapper.html

https://drobilla.net/software/serd.html

phpnode · 2 years ago

right, but they have a close association as OP says, and RDF is wrongly dismissed for similar reasons as XML (too complex, too much ceremony, too difficult, not enough value)

smsm42 · 2 years ago

Turtle solves all the XML cruft problems and is very readable. RDF is completely independent from rendering format, which doesn't have to be XML, I am surprised people still associate it with XML.

leipert · 2 years ago

This draft seems incomplete / hard to navigate.

Apparently one of the new features of RDF 1.2 is „Quoted Triples“ where triples can be used as Subject / Object. But unfortunately it seems at least the Turtle representation doesn’t support this yet?

Also the „what’s new in SPARQL 1.2“ is rather empty.

Feels like the linked documents are in a very early stage of a draft.

lukev · 2 years ago

Yes, it's an early draft.

The "quoted triples" technique is otherwise known as "RDF-star" Here's an article that explains the motivations and alternatives pretty comprehensively: https://www.ontotext.com/knowledgehub/fundamentals/what-is-r....

That's one of the reason for having an official 1.2 standard... so all the formats (including Turtle) will incorporate it in a compatible, correct way.

opminion · 2 years ago

N3 (Notation 3), a superset of Turtle, can represent quoted triples.

In the nicest possible way, and from a position of ignorance of the "Semantic Web": is anyone actually doing anything with these technologies outside of academia?

lukev · 2 years ago

RDF and related technologies are heavily used in healthcare and scientific fields, as well as industry.

The "semantic web" or "open linked data" concepts never really took off the way people had hoped, but there's till a ton of utility in the underlying standards so you'll tend to find it wherever you need complex, flexible schemas that with good interoperability between different entities.

mormegil · 2 years ago

Wikidata (a Wikimedia Foundation project) is an RDF dataset with a SPARQL querying service available at https://query.wikidata.org/

MilStdJunkie · 2 years ago

Defense industry, part of ARTT (Acquisition Requirements for Training Transformation, ), which is an incredibly-overdue-effort to merge specs. It's also being used to draft MBSE schemas for SysML, SysML has undefined overlap with the many many many other architect tools, and it's going to be the main player for MBSE (maybe . . there's some fighting about that).

These so-called "semantic web" technologies seem to come into their own when there's large scale organizations interfacing without a common reference frame. Like one org that does a spec from a programmer standpoint, and another org does one from a formal linguistics standpoint, then they have to integrate. For example, the USDoD Logistics steering group makes a spec for parts data from their requirements based on MTTF, cost, sparing, shelved space. USN makes a spec for parts data based on burn rate, transport, fuel type. It goes on and on like this, repeat a few dozen times, and you have a dump truck full of specs doing the same thing. See where I'm going here? They're speccing out the same thing from their own ivory towers, and - here's the kicker for those trying to LLM their way out of the situation - none of them are going to show their data to anyone else. The only thing that's exposed is the semantics. ARTT/CredEng is - or was, I am not sure if the program OR CredReg is still healthy - trying to solve this by unifying the semantics.

Ultimately someone's got to come along and give all these people a kick in the pants, one way or the other. You can't just float a boat around the ocean with no missiles, not these days.

kojiromike · 2 years ago

It's pretty common in healthcare data, or at least the kind that deals with breadth of patient data. When trying to build knowledge about a disease by looking at a lot of patients, it's rare to get much useful info from a single source. Re-associating that multi-source data lends itself to a graph. If the company has been around for a little while, even if the customer-facing products don't use a graph database, at some point somebody has certainly tried it. (And once somebody has tried it, it lives forever in some part of the organization.)

crucialfelix · 2 years ago

EU data including all regulations and delegated acts and publications are searchable by sparql

https://data.europa.eu/data/sparql?locale=en#

I did struggle to find what I wanted. It's a labyrinth of metadata, and I was looking for the structured regulation text itself. In the end I stuck with good old fashioned XHTML scraping

strangattractor · 2 years ago

Short answer is no. Spent many years (>10) listening to people explain how semantic technologies would transform Academic Publishing and make research more useful. Failure in my opinion because - nobody valued it enough to pay to have it done correctly, academic papers have a shelf life, academic papers often contain inaccurate information. Academics publish because it is required not because they have useful info to communicate. In most cases it is the least pleasant part of research.

I have seen many more useful tools come out of LLM in the short time it has been available than the entire 10 years working with academics using RDF, Ontologies etc. RDF is too difficult to use and has inadequate tooling. LLM is only going to get better.

gaogao · 2 years ago

LLMs and Semantic Web work reasonably well in concert https://friend.computer/jekyll/update/2023/04/30/wikidata-ll...

udp · 2 years ago

Yes, pretty much all knowledge management in biology is built upon technologies that came from the semantic web community, and biology is certainly not just academia.

There's not much application for knowledge graphs in e.g. a CRUD app of customer names and addresses, but turns out there are an unlimited number of things you can describe about e.g. a protein, and you can't just design one schema because you don't know how it's going to be queried.

See: https://bioregistry.io for countless examples of public datasets used everywhere from academia to "big pharma".

PaulHoule · 2 years ago

Adobe developed the XMP metadata format that embeds RDF packets inside almost any kind of file. It is heavily used in Adobe products and also others, see

https://ontology2.com/essays/LookingForMetadataInAllTheWrong...

whartung · 2 years ago

I'm using it in a personal project. I wanted something extensible, and ad hoc, and RDF is certainly that. But it's also typed, which is nice. I can add my own types (I may do that, not sure yet).

I am not well versed in the other RDF technologies. I haven't paid any attention to ontologies, or OWL or any of that stuff. I just use raw RDF, and defined my own vocabularies for everything, including structure. For example, I have my own type property. RDF also has one, but I just made my own. I have my own structure system to mostly bring order to how things are displayed, or created, etc. I am pretty much as far from the semantic web as you can get.

Since everything is "just a triple" it makes it easy to share data. So, it'll be straight forward to import and export artifacts out of my system and share them with others.

And I get SPARQL "for free", so even after new data structures are added, they're still queried like the first class ones the tool already knows about. SPARQL is pretty neat.

At the moment, I have a mostly complete RDF CRUD tool, with some first class interface prototypes (by first class I mean I have forms and UI specifically for those data types, rather than a generic resource form), and really like working with it. My DB has about 3.5M triples in it.

mhitza · 2 years ago

Which database do you use to store your RDF data, which supports SPARQL, and how does it perform?

> I just use raw RDF, and defined my own vocabularies for everything, including structure.

I think this is best approach. The ontology part was more of a hindrance for me way back in 2010's when I was experimenting with semantic web technologies (using dbpedia as a source of my data) and I really hard tried to avoid going of the beaten path (no matter how flaky it seemed) as a junior level developer.

lolgab · 2 years ago

Is your personal project open source? I would love to see how a real world application using pragmatic RDF and Jena looks like.

jszymborski · 2 years ago

FWIW, RDF does enable a lot of research into useful things, so even though those cases are "academic", they aren't w/o practical outcomes as the subtext of the question implies.

RDF is great for annotating protein interactions, for example.

epaulson · 2 years ago

The schema.org markup that goes into websites for SEO and smart snippets in search engines is all RDF, usually as JSON-LD. Millions of people use RDF every day without even knowing it.

https://developers.google.com/search/docs/appearance/structu...

tannhaeuser · 2 years ago

It's not nothing, even did a consulting gig years ago. Also, some non-profits such as Wikidata are putting it to good use I guess. But not everything benefits from representation as graph; for example, statistical data. Then there's always the unanswered question who's going to publish data without economical benefit when the money is, at best, in attention/eyeballs, or selling individual queries where backend tech isn't material or even visible. Do we even want to expose more machine-readable information in the age of ChatGPT?

TBL's vision for knowledge graphs is even older than the web. But should it be W3C's job to invent new tech? Does W3C's track record, legal and financial standing invite further standardization work? Their HTML and SVG charters have basically ceased working and W3C's last (final?) HTML recommendation is based on WHATWG HTML Review Draft January, 2020 [1].

[1]: https://sgmljs.net/blog/blog2303.html

Macacity · 2 years ago

The U.S. Open Data catalog [1] has all the metadata and even some data as Linked Data, same with the European Open Data catalog [2]

[1]: https://data.gov/ [2]: https://data.europa.eu/

lars_francke · 2 years ago

To extend on this. A lot of this is based on DCAT[1] (which is a RDF vocabulary) and for Europe the extension DCAT-AP[2] which is then further extended by country specific standards.

[1] https://www.w3.org/TR/vocab-dcat-3/

[2] https://joinup.ec.europa.eu/collection/semantic-interoperabi...

[3] e.g. https://www.dcat-ap.de/ or https://docs.dataportal.se/dcat/en/

enord · 2 years ago

If you’re pying a botique consultancy fat stacks to fix your megaCo’s absolute hairball data integration, the probability of RDF approaches 1 as price goes to ∞.

Trivial (as in ‘cat graphx graphy’) merging of complex data graphs is just too powerful.

hobofan · 2 years ago

Not only via botique consultancies. Almost every megaCo has some ontology team working for them. Sadly almost all of them also seem to be stuck into a perpetual conceptual phase with very little actual impact on the business.

AlecSchueler · 2 years ago

We were using in production to power our machine learning already half a decade ago, in the region's highest valued startup.

Was also told by some of my colleagues that they were earning a lot consulting in the medical world with semantic data.

smsm42 · 2 years ago

Yes, there are a lot of projects using RDF-adjacent technologies and graph databases. But most of them are not very hyped up, so you won't know it unless you know where to look. Amazon has graph database: https://en.wikipedia.org/wiki/Amazon_Neptune which suggests somebody is using it (besides that I know for a fact people use it)

williamdclt · 2 years ago

Google's authorisation system Zanzibar (https://research.google/pubs/pub48190/) is not explicitly using SPARQL, but it's a good example of the sort of thing that a graph-based data model can do. I _think_ Zanzibar can be implemented with SPARQL: we implemented something very very close (and more powerful in some ways) with it

kendallgclark · 2 years ago

Accenture just invested in Stardog, leading Knowledge Graph plaform, which is based on these W3C standards. Can't get much less "academic" than Accenture.

https://newsroom.accenture.com/news/accenture-invests-in-sta...

wiredfool · 2 years ago

UNESCO does some stuff with it in places. (Ocean info hub, which I did some work on). The EU likes DCAT for data syndication, which is generally serialized as RDF. They’re also promoting linked data, but I don’t really see how it’s truly useful. It mainly seems like a way to turn a bulk download into 100k api requests.

NoelDeMartin · 2 years ago

Check out https://solidproject.org (If you want a short intro I recently gave a ~30min talk about it: https://noeldemartin.com/fosdem)

th0ma5 · 2 years ago

Firefox uses RDF internally. Chase Bank uses it presumably.