Skip to main content

Change your cookie settings

We cannot change your cookie settings at the moment because JavaScript is not running in your browser. To fix this, try:

  1. turning on JavaScript in your browser settings
  2. reloading this page
View cookies

Change your cookie settings

We cannot change your cookie settings at the moment because JavaScript is not running in your browser. To fix this, try:

  1. turning on JavaScript in your browser settings
  2. reloading this page
View cookies
  1. Home
  2. Support
  3. FAQs
  4. Bitesize: what Linked Data looks like technically

Bitesize: what Linked Data looks like technically

Created: 21 January 2021 Updated: 02 March 2021

What it looks like technically

Rather than being stored in a database as a collection of connected tables, rows and columns, linked data is stored in a graph database. A graph database is a much simpler way of storing data, which can be thought of as one big table, consisting of three columns, and as many rows as are needed (often hundreds of millions).

Each piece of information is stored as a triple - a statement that consists of an identifier (the subject), a property (the predicate), and a value (the object). As an analogy, it works a bit like grammar: The cat (subject) sat on (predicate) the mat (object), but it's easier to show as a real world example.

If we take catchment data explorer as an example, we have a lot of data about water bodies. One of these data items is the overall 2019 classification. In the graph database, this looks like this for one water body:

subject          predicate            object

Captain's Pond    2019classification   Moderate

 

For two water bodies, it looks like:

subject           predicate               object

Captain's Pond    2019classification   Moderate
Decoy Broad       2019classification   Poor

 

This principle can be extended, so more water bodies just means more rows (or triples). If a new water body is discovered, or created in the real world, then it's simply a case of adding more triples.

As well as creating more water bodies, this way of storing data makes the data model flexible. To add 2020 classification data to the database, we would simple add more triples - no requirement to change the structure of tables.

subject          predicate            object

Captain's Pond    2019classification   Moderate
Captain's Pond    2020classification   Good

 

We have a lot more information about these water bodies, and these can all be represented by more triples:

subject          predicate               object

Captain's Pond    2019classification      Moderate
Captain's Pond    meanDepth               1m
Captain's Pond    altitude                26m
Captain's Pond    waterBodyType           Lake
Captain's Pond    parentOpCatchment       Bure
...

 

With thousands of water bodies in England, it's easy to see how the number of triples can get into the millions.

From these examples so far, the 'linked' part of linked data hasn't surfaced. Looking at the examples above, we have the subject water body - Captain's Pond. With thousands of water bodies, there's a good chance that two water bodies could have the same name, which as well as being confusing, would cause problems with this data model. To help with this, instead of using the name of the water body as the subject, we use an identifier (a URI). These identifiers are unique within the database, and are consistent - ie wherever we want to attach some data to Captain's Pond, we can use its identifier, which is GB30535397.

subject          predicate               object

GB30535397        label                   Captain's Pond
GB30535397        2019classification      Moderate
GB30535397        meanDepth               1m
GB30535397        altitude                26m
GB30535397        waterBodyType           Lake
GB30535397        parentOpCatchment       Bure
...

 

But this still isn't linked. Because actually, we don't just use the identifier on its own - we use a URL as the identifier. In the case of Captain's Pond on Catchment Explorer, this is https://environment.data.gov.uk/catchment-planning/WaterBody/GB30535397. This has several benefits:

  • the identifier is globally unique - we know exactly what it is that we are talking about when use that identifier.

  • we can create a page on the internet that holds the information that we know about this water body (you can click the link to see this in practice).

  • other people and applications can link to this water body. This could be from a report produced within the Environment Agency, it could be an article about the water body on a local news website, or it could be referenced from a separate dataset in another service, such as Flood Plan Explorer

Wherever possible, data points within the dataset use URLs as their identifiers - in the example above - the classification, the water body type and the operational catchment that the water body is within would all be stored as URLs, each with their own page where you can find out more information, such as what 'Moderate' actually means, the definition of a 'Lake', or which other water bodies are in the same Operational Catchment area. This web of data has the potential to be extremely powerful, allowing people to explore, discover and use all the information in the dataset.

 

To explore Defra's linked data, go to Defra Data Services Platform. For enquiries about the data, please Ask a Question, Report a Problem or Give Feedback.