I am extremely glad to see RDF being worked on and improved. It's an extremely powerful, general data model. It gets a bad rap because if its unfortunate association with the XML syntax and the fact that many of the reference implementations were written at the peak of overdesigned OO cruft.
This type of semantically precise yet flexible data model is going to become increasingly important as a bridge between highly structured data in traditional databases, and unstructured information processing using LLMs. GPT does a surprisingly good job of converting between unstructured data and RDF, and my hope is that LLMs can provide some of the key components in building an actual semantic web, which has remained elusive for so long (for many good reasons.)
XML isn't great for every use-case, but it's really become the Nickelback of formats. Let's be real, it's pretty brilliant for some things, and I think RDF is a good example of where it shines.
right, but they have a close association as OP says, and RDF is wrongly dismissed for similar reasons as XML (too complex, too much ceremony, too difficult, not enough value)
Turtle solves all the XML cruft problems and is very readable. RDF is completely independent from rendering format, which doesn't have to be XML, I am surprised people still associate it with XML.
Apparently one of the new features of RDF 1.2 is „Quoted Triples“ where triples can be used as Subject / Object. But unfortunately it seems at least the Turtle representation doesn’t support this yet?
Also the „what’s new in SPARQL 1.2“ is rather empty.
Feels like the linked documents are in a very early stage of a draft.
That's one of the reason for having an official 1.2 standard... so all the formats (including Turtle) will incorporate it in a compatible, correct way.
I am glad to see this as well. I decided to use RDF for my personal project because it was well specified, has many implementations, and a human readable syntax. In the end, it is just data but I wanted to make it as accessible as possible. Does this mean that RDF is always the right choice? No, but it worked for my use case. I wish there were more choices in the open source Triplestore space with good OWL2 support but my project works with what is out there and if someone wants to transform it into something else, that is entirely possible to do.
My impression is that the trade off when choosing RDF vs a property graph when trying to model graph data is between maximal schema flexibility and the ability to infinitely break apart the data model down to the smallest atomic structures because literally everything is a node that is either an IRI(as unique identifier) or a primitive. Vs the convenience of having more complex nodes and edges with some structure built in where you can collapse some fields down and call them properties to describe individual nodes and edges. In RDF you have to create all of that yourself with triples which can lead to some large structures for relatively common tasks like referencing edges and for reification of statements.
In the nicest possible way, and from a position of ignorance of the "Semantic Web": is anyone actually doing anything with these technologies outside of academia?
RDF and related technologies are heavily used in healthcare and scientific fields, as well as industry.
The "semantic web" or "open linked data" concepts never really took off the way people had hoped, but there's till a ton of utility in the underlying standards so you'll tend to find it wherever you need complex, flexible schemas that with good interoperability between different entities.
Defense industry, part of ARTT (Acquisition Requirements for Training Transformation, ), which is an incredibly-overdue-effort to merge specs. It's also being used to draft MBSE schemas for SysML, SysML has undefined overlap with the many many many other architect tools, and it's going to be the main player for MBSE (maybe . . there's some fighting about that).
These so-called "semantic web" technologies seem to come into their own when there's large scale organizations interfacing without a common reference frame. Like one org that does a spec from a programmer standpoint, and another org does one from a formal linguistics standpoint, then they have to integrate. For example, the USDoD Logistics steering group makes a spec for parts data from their requirements based on MTTF, cost, sparing, shelved space. USN makes a spec for parts data based on burn rate, transport, fuel type. It goes on and on like this, repeat a few dozen times, and you have a dump truck full of specs doing the same thing. See where I'm going here? They're speccing out the same thing from their own ivory towers, and - here's the kicker for those trying to LLM their way out of the situation - none of them are going to show their data to anyone else. The only thing that's exposed is the semantics. ARTT/CredEng is - or was, I am not sure if the program OR CredReg is still healthy - trying to solve this by unifying the semantics.
Ultimately someone's got to come along and give all these people a kick in the pants, one way or the other. You can't just float a boat around the ocean with no missiles, not these days.
It's pretty common in healthcare data, or at least the kind that deals with breadth of patient data. When trying to build knowledge about a disease by looking at a lot of patients, it's rare to get much useful info from a single source. Re-associating that multi-source data lends itself to a graph. If the company has been around for a little while, even if the customer-facing products don't use a graph database, at some point somebody has certainly tried it. (And once somebody has tried it, it lives forever in some part of the organization.)
I did struggle to find what I wanted. It's a labyrinth of metadata, and I was looking for the
structured regulation text itself. In the end I stuck with good old fashioned XHTML scraping
Short answer is no. Spent many years (>10) listening to people explain how semantic technologies would transform Academic Publishing and make research more useful. Failure in my opinion because - nobody valued it enough to pay to have it done correctly, academic papers have a shelf life, academic papers often contain inaccurate information. Academics publish because it is required not because they have useful info to communicate. In most cases it is the least pleasant part of research.
I have seen many more useful tools come out of LLM in the short time it has been available than the entire 10 years working with academics using RDF, Ontologies etc. RDF is too difficult to use and has inadequate tooling. LLM is only going to get better.
Yes, pretty much all knowledge management in biology is built upon technologies that came from the semantic web community, and biology is certainly not just academia.
There's not much application for knowledge graphs in e.g. a CRUD app of customer names and addresses, but turns out there are an unlimited number of things you can describe about e.g. a protein, and you can't just design one schema because you don't know how it's going to be queried.
See: https://bioregistry.io for countless examples of public datasets used everywhere from academia to "big pharma".
Adobe developed the XMP metadata format that embeds RDF packets inside almost any kind of file. It is heavily used in Adobe products and also others, see
I'm using it in a personal project. I wanted something extensible, and ad hoc, and RDF is certainly that. But it's also typed, which is nice. I can add my own types (I may do that, not sure yet).
I am not well versed in the other RDF technologies. I haven't paid any attention to ontologies, or OWL or any of that stuff. I just use raw RDF, and defined my own vocabularies for everything, including structure. For example, I have my own type property. RDF also has one, but I just made my own. I have my own structure system to mostly bring order to how things are displayed, or created, etc. I am pretty much as far from the semantic web as you can get.
Since everything is "just a triple" it makes it easy to share data. So, it'll be straight forward to import and export artifacts out of my system and share them with others.
And I get SPARQL "for free", so even after new data structures are added, they're still queried like the first class ones the tool already knows about. SPARQL is pretty neat.
At the moment, I have a mostly complete RDF CRUD tool, with some first class interface prototypes (by first class I mean I have forms and UI specifically for those data types, rather than a generic resource form), and really like working with it. My DB has about 3.5M triples in it.
Which database do you use to store your RDF data, which supports SPARQL, and how does it perform?
> I just use raw RDF, and defined my own vocabularies for everything, including structure.
I think this is best approach. The ontology part was more of a hindrance for me way back in 2010's when I was experimenting with semantic web technologies (using dbpedia as a source of my data) and I really hard tried to avoid going of the beaten path (no matter how flaky it seemed) as a junior level developer.
FWIW, RDF does enable a lot of research into useful things, so even though those cases are "academic", they aren't w/o practical outcomes as the subtext of the question implies.
RDF is great for annotating protein interactions, for example.
The schema.org markup that goes into websites for SEO and smart snippets in search engines is all RDF, usually as JSON-LD. Millions of people use RDF every day without even knowing it.
It's not nothing, even did a consulting gig years ago. Also, some non-profits such as Wikidata are putting it to good use I guess. But not everything benefits from representation as graph; for example, statistical data. Then there's always the unanswered question who's going to publish data without economical benefit when the money is, at best, in attention/eyeballs, or selling individual queries where backend tech isn't material or even visible. Do we even want to expose more machine-readable information in the age of ChatGPT?
TBL's vision for knowledge graphs is even older than the web. But should it be W3C's job to invent new tech? Does W3C's track record, legal and financial standing invite further standardization work? Their HTML and SVG charters have basically ceased working and W3C's last (final?) HTML recommendation is based on WHATWG HTML Review Draft January, 2020 [1].
To extend on this.
A lot of this is based on DCAT[1] (which is a RDF vocabulary) and for Europe the extension DCAT-AP[2] which is then further extended by country specific standards.
If you’re pying a botique consultancy fat stacks to fix your megaCo’s absolute hairball data integration, the probability of RDF approaches 1 as price goes to ∞.
Trivial (as in ‘cat graphx graphy’) merging of complex data graphs is just too powerful.
Not only via botique consultancies. Almost every megaCo has some ontology team working for them. Sadly almost all of them also seem to be stuck into a perpetual conceptual phase with very little actual impact on the business.
Yes, there are a lot of projects using RDF-adjacent technologies and graph databases. But most of them are not very hyped up, so you won't know it unless you know where to look. Amazon has graph database: https://en.wikipedia.org/wiki/Amazon_Neptune which suggests somebody is using it (besides that I know for a fact people use it)
Google's authorisation system Zanzibar (https://research.google/pubs/pub48190/) is not explicitly using SPARQL, but it's a good example of the sort of thing that a graph-based data model can do. I _think_ Zanzibar can be implemented with SPARQL: we implemented something very very close (and more powerful in some ways) with it
Accenture just invested in Stardog, leading Knowledge Graph plaform, which is based on these W3C standards. Can't get much less "academic" than Accenture.
UNESCO does some stuff with it in places. (Ocean info hub, which I did some work on). The EU likes DCAT for data syndication, which is generally serialized as RDF. They’re also promoting linked data, but I don’t really see how it’s truly useful. It mainly seems like a way to turn a bulk download into 100k api requests.
I wonder if RDF could actually be used to implement "trigger warnings". List triggering tags in your browser, and then the triggering content can be tagged and blocked by your browser. No censorship involved, just responsible disclosure.
You’re seeking a technical solution to a social problem. There have been a number of attempts at very similar things; they always fail because practically no one is interested in putting in the effort required, if they even know about it.
> no one is interested in putting in the effort required, if they even know about it
There is a sizeable web subculture with a focus on an accessible, sustainable, semantic, open standards etc. web with different flavors.
When you browse sites, read blogs etc. that typically load fast, have a nerd feeling to them and prominently display RSS and some of the more niche open web stuff, then you might be visiting a site of someone who would implement such a feature with a blink of an eye.
This type of semantically precise yet flexible data model is going to become increasingly important as a bridge between highly structured data in traditional databases, and unstructured information processing using LLMs. GPT does a surprisingly good job of converting between unstructured data and RDF, and my hope is that LLMs can provide some of the key components in building an actual semantic web, which has remained elusive for so long (for many good reasons.)
XML isn't great for every use-case, but it's really become the Nickelback of formats. Let's be real, it's pretty brilliant for some things, and I think RDF is a good example of where it shines.
https://www.w3.org/2007/02/turtle/primer/
Audio (etc) plugin format that uses .ttl:
https://lv2plug.in/
https://gitlab.com/lv2/lv2/-/tree/master/lv2/core.lv2
https://gitlab.com/lv2/lv2/-/tree/master/schemas.lv2
https://drobilla.net/software/sord https://drobilla.net/software/serd
drobilla has mentioned that, if there was a C JSON-LD lib, that might be enough to warrant an LV3
For more: https://github.com/lv2/lv2/wiki
P.S. not many people know that LV2 does modular synth style CV https://linuxmusicians.com/viewtopic.php?t=20701&p=112242
https://www.w3.org/TR/n-triples/
https://www.w3.org/TR/n-quads/
which can end up as the easiest formats to work with sometimes.
There are tools like 'rapper' and 'serd' to convert to and from the various formats.
https://librdf.org/raptor/rapper.html
https://drobilla.net/software/serd.html
Apparently one of the new features of RDF 1.2 is „Quoted Triples“ where triples can be used as Subject / Object. But unfortunately it seems at least the Turtle representation doesn’t support this yet?
Also the „what’s new in SPARQL 1.2“ is rather empty.
Feels like the linked documents are in a very early stage of a draft.
The "quoted triples" technique is otherwise known as "RDF-star" Here's an article that explains the motivations and alternatives pretty comprehensively: https://www.ontotext.com/knowledgehub/fundamentals/what-is-r....
That's one of the reason for having an official 1.2 standard... so all the formats (including Turtle) will incorporate it in a compatible, correct way.
This primer is a really good introduction to understand RDF and SPARQL. Both seem extremely powerful.
If you are interested, my project is here: https://github.com/cyocum/irish-gen and a few posts about it are here https://cyocum.github.io/.
The "semantic web" or "open linked data" concepts never really took off the way people had hoped, but there's till a ton of utility in the underlying standards so you'll tend to find it wherever you need complex, flexible schemas that with good interoperability between different entities.
These so-called "semantic web" technologies seem to come into their own when there's large scale organizations interfacing without a common reference frame. Like one org that does a spec from a programmer standpoint, and another org does one from a formal linguistics standpoint, then they have to integrate. For example, the USDoD Logistics steering group makes a spec for parts data from their requirements based on MTTF, cost, sparing, shelved space. USN makes a spec for parts data based on burn rate, transport, fuel type. It goes on and on like this, repeat a few dozen times, and you have a dump truck full of specs doing the same thing. See where I'm going here? They're speccing out the same thing from their own ivory towers, and - here's the kicker for those trying to LLM their way out of the situation - none of them are going to show their data to anyone else. The only thing that's exposed is the semantics. ARTT/CredEng is - or was, I am not sure if the program OR CredReg is still healthy - trying to solve this by unifying the semantics.
Ultimately someone's got to come along and give all these people a kick in the pants, one way or the other. You can't just float a boat around the ocean with no missiles, not these days.
https://data.europa.eu/data/sparql?locale=en#
I did struggle to find what I wanted. It's a labyrinth of metadata, and I was looking for the structured regulation text itself. In the end I stuck with good old fashioned XHTML scraping
I have seen many more useful tools come out of LLM in the short time it has been available than the entire 10 years working with academics using RDF, Ontologies etc. RDF is too difficult to use and has inadequate tooling. LLM is only going to get better.
There's not much application for knowledge graphs in e.g. a CRUD app of customer names and addresses, but turns out there are an unlimited number of things you can describe about e.g. a protein, and you can't just design one schema because you don't know how it's going to be queried.
See: https://bioregistry.io for countless examples of public datasets used everywhere from academia to "big pharma".
https://ontology2.com/essays/LookingForMetadataInAllTheWrong...
I am not well versed in the other RDF technologies. I haven't paid any attention to ontologies, or OWL or any of that stuff. I just use raw RDF, and defined my own vocabularies for everything, including structure. For example, I have my own type property. RDF also has one, but I just made my own. I have my own structure system to mostly bring order to how things are displayed, or created, etc. I am pretty much as far from the semantic web as you can get.
Since everything is "just a triple" it makes it easy to share data. So, it'll be straight forward to import and export artifacts out of my system and share them with others.
And I get SPARQL "for free", so even after new data structures are added, they're still queried like the first class ones the tool already knows about. SPARQL is pretty neat.
At the moment, I have a mostly complete RDF CRUD tool, with some first class interface prototypes (by first class I mean I have forms and UI specifically for those data types, rather than a generic resource form), and really like working with it. My DB has about 3.5M triples in it.
> I just use raw RDF, and defined my own vocabularies for everything, including structure.
I think this is best approach. The ontology part was more of a hindrance for me way back in 2010's when I was experimenting with semantic web technologies (using dbpedia as a source of my data) and I really hard tried to avoid going of the beaten path (no matter how flaky it seemed) as a junior level developer.
RDF is great for annotating protein interactions, for example.
https://developers.google.com/search/docs/appearance/structu...
TBL's vision for knowledge graphs is even older than the web. But should it be W3C's job to invent new tech? Does W3C's track record, legal and financial standing invite further standardization work? Their HTML and SVG charters have basically ceased working and W3C's last (final?) HTML recommendation is based on WHATWG HTML Review Draft January, 2020 [1].
[1]: https://sgmljs.net/blog/blog2303.html
[1]: https://data.gov/ [2]: https://data.europa.eu/
[1] https://www.w3.org/TR/vocab-dcat-3/
[2] https://joinup.ec.europa.eu/collection/semantic-interoperabi...
[3] e.g. https://www.dcat-ap.de/ or https://docs.dataportal.se/dcat/en/
Trivial (as in ‘cat graphx graphy’) merging of complex data graphs is just too powerful.
Was also told by some of my colleagues that they were earning a lot consulting in the medical world with semantic data.
https://newsroom.accenture.com/news/accenture-invests-in-sta...
There is a sizeable web subculture with a focus on an accessible, sustainable, semantic, open standards etc. web with different flavors.
When you browse sites, read blogs etc. that typically load fast, have a nerd feeling to them and prominently display RSS and some of the more niche open web stuff, then you might be visiting a site of someone who would implement such a feature with a blink of an eye.
Dead Comment