Better Data With Metadata

Steve Taitoko
Sep 19, 2022
2 min read

People regularly ask what the meaning of Eratos is. In short, we named the company after Eratosthenes, the Greek philosopher, mathematician, and scholar who lived around 200 BC. Among his many important achievements, he was the chief librarian at the Great Library of Alexandria, and it was at that institution that the earliest recorded examples of metadata can be found.

Metadata simply put, is data about data. It was applied to scrolls at the library by attaching small tags containing the title, subject and author, allowing library users to assume the content, without having to unroll each scroll and so they could be returned to their rightful location after use. Metadata today still apply that simple principle, but with a little more detail.

The use of metadata is not just a convenient ‘nice to have’, it is a ‘must-have’, particularly as we design more advanced analytics relying upon machine-to-machine communications. Mixing structured and unstructured data into automated analytic processes, mandates rapid access to data about the data, to retrieve precise, relevant and quality data.

In an earlier journal article, we discussed the power of the FAIR principles and how they are foundational in the design of Eratos. The key to achieving a data system based on FAIR principles is robust metadata. In this context:

Metadata makes data findable by allowing search on the elements of metadata that tell the story of the data.
Metadata makes data accessible by creating elements within metadata that provide access control and permissions management.
Metadata makes data interoperable through the standardisation of the metadata’s vocabularies and formats.
Metadata makes data reusable by providing rich contextual provenance about the data’s origins, use, licence, and governance.

In making data findable and accessible, the task is relatively straight forward, relying on the addition of descriptive elements to metadata. However, making data interoperable and reusable requires a high level of coordination to ensure metadata is created consistently, regardless of how it was generated or by who. To ensure standardisation, frameworks called schemas are required.

A schema provides the structure of a metadata collection to a specific domain, such as published research. They establish a set of rules that define required information, describing the semantics and relationships of the elements, for example Digital Object Identifiers (DOIs) which are persistent interoperable identifiers for use on digital networks, adopted by scholars as a means to describe published research.

Much effort has been made to create standard schemas through international validation by organisations such as the International Standards Organisation. One of the most commonly used standards is the Dublin Core Metadata Initiative (DCMI). Used often on webpages, it is simple and generic with just 15 elements, containing information such as title, date, type, format, creator, and rights.

Eratos utilises standard schemas to ensure metadata across all fields of cyber-physical data is interoperable and reusable. As they are machine-readable, schemas provide scalability, particularly as we move to more data generated and stored at the edge and are a vital component as we build the world’s semantic web of cyber-physical data.

Without standardised schemas and metadata, it will be impossible to extract needles from a rapidly expanding data haystack at speed and scale. Thankfully, the groundwork laid by our academic community allows us to bring robust and uniform metadata into the mainstream and for that, we have better data.