Linked Data and RDF
Linked Data is a set of practices for publishing structured, machine-readable data on the web, articulated by Tim Berners-Lee in a 2006 design issues note as four short principles: name things with URIs, make those URIs dereference over HTTP, return useful RDF when they do, and link out to other URIs. The underlying data model is RDF, a W3C standard since 1999 that represents information as subject-predicate-object triples. Realized at scale in DBpedia, Wikidata, GeoNames, and the wider Linked Open Data cloud, the stack has thrived in libraries and cultural heritage but seen limited adoption elsewhere due to its steep learning curve.
Linked Data is a set of conventions for publishing structured, machine-readable data on the web so that datasets can be discovered and reused across organizational boundaries. Tim Berners-Lee introduced the term in a short design issues note dated July 27, 2006, summarizing four expectations: use URIs as names for things; use HTTP URIs so those names can be looked up; when someone dereferences a URI, return useful information using open standards such as RDF and SPARQL; and include links to other URIs so further data can be discovered. He later added a five-star rating system that escalates from open data in any format up to RDF that links to other linked data sources. The underlying data model is the Resource Description Framework (RDF), a W3C recommendation since 1999 (with a revised RDF 1.1 in 2014). RDF represents every statement as a triple of subject, predicate, and object, where subjects and predicates are URIs and objects are either URIs or literal values. A set of triples forms a directed labelled graph that can be merged with any other RDF graph without schema conflicts, which is the property that makes cross-dataset linking tractable. Vocabularies layer meaning on top. RDF Schema (RDFS) defines basic classes and property hierarchies. OWL, the Web Ontology Language, adds richer constructs such as cardinality restrictions, disjointness, and equivalence, with a formal logical semantics that supports automated reasoning. SKOS (Simple Knowledge Organization System) is used to express thesauri, taxonomies, and controlled vocabularies, and is widely used in libraries. RDF graphs can be written in several interchangeable serializations. RDF/XML was the original 1999 format; Turtle and the simpler line-based N-Triples are now preferred for human authoring, and JSON-LD packages RDF inside JSON so it can be consumed by ordinary web developers and embedded in HTML pages for search engines. The Linked Open Data cloud is the visible result. DBpedia, started in 2007, extracts structured data from Wikipedia infoboxes and exposes hundreds of millions of triples. Wikidata, launched in 2012 by the Wikimedia Foundation, is a collaboratively edited knowledge base whose entities are reused across Wikipedia language editions and many third-party tools. GeoNames publishes RDF descriptions of more than seven million geographic features. Public SPARQL endpoints, including the Wikidata Query Service and DBpedia, let anyone run graph queries over these datasets. Adoption has been uneven. Libraries, museums, archives, and government open-data portals have absorbed the stack heavily, in part because their cataloguing practices map well onto controlled vocabularies. Outside that sector uptake is weaker: critics point to the conceptual overhead of URIs and ontologies, the cost of running SPARQL endpoints, and the rise of simpler JSON APIs and property-graph databases as alternatives with a much lower entry barrier.