DuCharme: Chapter 02: The Semantic Web, RDF, and Linked Data
Metadata
Title: The Semantic Web, RDF, and Linked Data
Number: 2
Book: DuCharme Learning SPARQL
URLs, URIs, Namespaces
Uniform Resource Locators (URLs) were one of the initial trilogy of specifications from TBL, the others being html and http. They are a compact way for the client to specify the resource it wants.
Ownership of a domain gives you authority to control the file structure and resource names. This led to people to use the names for resources that weren’t web addresses, eg the FOAF vocabulary used urls like http://xmlns.com/foaf/0.1/Person to refer to a concept, with no actual web page at the end of it.
This confused people and an attempt was made to create a new syntax, Universal Resource Names or URNS that would look like: urn:xmlns.com/foaf/0.1/Person instead. A meta concept to encompass both URNs and URLs was introduced, the Universal Resource Identifier or (URI) - note that it’s Universal RI, but Uniform RL.
URNs didn’t catch on at all, so now people often use URI and URL interchangeably.
URIs don’t support many alphabets, eg Chinese or Cyrillic, so there is yet another spec, the Internationalized Resource Identifiers or IRIs, that extend the spec to support more characters. IRI is the most inclusive, though again they are often used interchangeably.
These identifiers happened to solve another problem, clashes in XML entity models. URIs were selected as the way to specify xml namespaces. So you’ll often see xml docs like this:
<xml xmlns:dc="http://purl.org/dc/elements/1.1/">
<book>
<dc:title>My Great Book</dc:title>
</book>
Prefixes don’t identify the namespace, the URIs do, they are just shorthand.
In RDF, subjects and predicates must belong to specific namespaces. RDF follows the same prefix convention as xml, but you can also just use the full URI instead if you want (just enclose it in angle brackets in many syntaxes). Just about anywhere in RDF and SPARQL where you can use a URI you can use a prefixed name instead, so long as the prefix has been declared. The part of the name after the colon is the local name, the full name is the prefixed name.
Namespace | TTL Prefix declaration | Prefixed Name | Local Name |
---|---|---|---|
<http://purl.org/dc/elements/1.1/> | @prefix dc: <http://purl...> | dc:title | title |
RDF
To recap, the Resource Description Framework RDF:
Is a data model in which the basic unit of information is a triple.
A triple consists of a subject, predicate, and object - a resource identifier, an attribute or property name, and an attribute or property value.
To remove ambiguity, the subject and predicate of a triple must be URIs (though we can use prefixed names in place of full URIs).
Serialization
The most common serialization format for RDF is Turtle, but RDF has a number of different ones.
N-Triples
The simplest is N-Triples, a subset of N3. In N-Triples URIs are written in angle brackets, strings in quote marks. Each triple has its own line, with a period at the end, eg:
<urn:isbn:011421421> <http://purl.org/dc/elements/1.1/creator> <http://www.w3.org/People/Berners-Lee/card#i> .
The order of a set of triples does not matter. N-Triples is popular for teaching and can be quick to parse, but is verbose.
RDF/XML and RDFa
The oldest serialization is RDF/XML which was part of the original RDF spec. At time of publication, it’s still the only standardized serialization format. Here is an example:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:v="http://www.w3.org/2006/vcard/">
<rdf:Description rdf:about="urn:isbn:006251587X">
<dc:title>Weaving the Web</dc:title>
<dc:creator rdf:resource="http://www.w3.org/People/Berners-Lee/card#i"/>
</rdf:Description>
<rdf:Description rdf:about="http://www.w3.org/People/Berners-Lee/card#i">
<v:title>Director</v:title>
</rdf:Description>
</rdf:RDF>
Some notes on the spec:
The element containing all the triples must be an RDF element from the namespace:
http://www.w3.org/1999/02/22-rdf-syntax-ns
The subject of each triple is named in the
rdf:about
attribute in therdf:Description
element.You can have a separate
rdf:Description
tag for each triple, or you can group them, as here, to have multiple children, and so multiple triples attached to one subject.Note the objects of the triples may be plain strings (eg
v:title
) or resources themselves (seedc:creator
).
RDF/XMl never became popular, it is a pain to process.
An alternative is RDFa where RDF triples are embedded in other html or xml documents as attributes. Tools can then pull the triples out of the html on a web page, for example and let you run SPARQL queries over them.
N3 and Turtle
N3 was a personal project by TBL to be an easier way to write triples than RDF/XML, here’s an example:
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix v: <http://www.w3.org/2006/vcard/> .
<http://www.w3.org/People/Berners-Lee/card#i>
v:title "Director" .
<urn:isbn:006251587X>
dc:creator <http://www.w3.org/People/Berners-Lee/card#i> ;
dc:title "Weaving the Web" .
To compare with N-Triples, extra whitespace and prefixes are allowed. Note the semi-colon to indicate another predicate and object are coming for that subject. You can also use a comma to indicate that the next object will belong to the preceding subject and predicate.
N3 also has extra features, like specifying inference rules or referencing a whole graph of triples as a resource.
Turtle is N3 without the inference and other bells and whistles. Turtle is the most popular format and is being standardized.
RDF Databases and Data Typing
If you’re storing a lot of triples then a flat file isn’t going to be great, several database management systems are available. You can store RDF in relational databases, but optimized triple stores are a better bet.
Objects can be more than just strings or other resources, you can assign types, often people use the xml schema types. Turtle processors will infer the type if you leave quote marks off literals and use true
or 4
or 3.15
for example. You can also explicitly declare the type (for example for date types. Here’s an example with all types explicit:
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix d: <http://learningsparql.com/ns/data#> .
@prefix dm: <http://learningsparql.com/ns/demo#> .
d:item342 dm:shipped "2011-02-14"^^<http://www.w3.org/2001/XMLSchema#date> .
d:item342 dm:quantity "4"^^xsd:integer .
d:item342 dm:invoiced "false"^^xsd:boolean .
d:item342 dm:costPerItem "3.50"^^xsd:decimal .
Labelling
Most RDF subjects are just unique identifiers, they often don’t have an interpretable human meaning. So the rdfs:label
predicate is very important, it’s best practice to assing values to resources so that human readers can more easily see what they represent. It’s also common to use multiple labels with language tags, like this:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<http://dbpedia.org/resource/Switzerland> rdfs:label "Switzerland"@en,
"Suiza"@es, "Sveitsi"@fi, "Suisse"@fr .
The @en
etc is a language tag. The SKOS standard offers skos:prefLabel
for preferred label and skos:altLabel
for alternate labels, and more. There is also a rdfs:comment
property that can be used for longer descriptions.
Blank nodes
Blank nodes can be used to group together otherwise flat nodes (think elements of an address). They can be denoted in Turtle by a prefix of underscore, for example _:b1
, which can then be used as the subject of a set of triples. Square brackets are sometimes used instead of the underscore prefix.
Vocabularies: RDF Schema
Vocabularies are usually stored using RDF Schema and OWL standards.
RDF schema is a vocabulary description language - a way to describe a vocabulary. For example here is a description of the dc:creator
predicate:
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
dc:creator
rdf:type rdf:Property ;
# a rdf:Property ;
rdfs:comment "An entity primarily responsible for making the resource."@en-US ;
rdfs:label "Creator"@en-US .
Note the rdf:type
property, which means that the subject is an instance of the object class. The a
shorthand offered by Turtle, N3, and SPARQL makes this clearer to read. We can define new classes too:
@prefix ab: <http://learningsparql.com/ns/addressbook#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
ab:Musician
rdf:type rdfs:Class ;
rdfs:label "Musician" ;
rdfs:comment "Someone who plays a musical instrument" .
ab:MusicalInstrument
a rdfs:Class ;
rdfs:label "musical instrument" .
ab:playsInstrument
rdf:type rdf:Property ;
rdfs:comment "Identifies the instrument that someone plays" ;
rdfs:label "plays instrument" ;
rdfs:domain ab:Musician ;
rdfs:range ab:MusicalInstrument .
Note that with the rdfs:domain
property I can say that the subject of a ?s ab:playsInstrument ?o
triple belongs to the class of Musicians, while the rdf:range
property here means that the object belongs to the class of musical instruments. Those will be inferred by RDFS aware query processors. Note that RDFS-aware means OWL in practice.
Ontologies: OWL
OWL is the W3C’s Web Ontology Language. It builds on RDFS to let you define ontologies, formal definitions of vocabularies that allow you to define complex structures as well as new relationships between your vocabulary terms and members of the classes you define.
An ontology defined with OWL is just another collection of triples. We can specify that are properties are symmetric, or inverse for example, to allow reasoning engines to do more with our data.