top of page

Making Data Fair



There exists a well-worn phrase that ‘data is the next oil’, the irony of which appears lost given the technologically-driven shift to renewable energy. However, as the inventor of the internet, Tim Berners-Lee said, “data is a precious thing and will last longer than the systems themselves”, suggesting that data has more than a single useful life.


The idea of data as reusable or temporal in nature isn’t a new concept, however, even amongst the research community, there is a growing realisation that outdated systems and lack of data governance, have resulted in significant amounts of data locked-up inside institutions. The result is some of our most highly prized data, created by some of our brightest minds, are gathering virtual dust at the bottom of digital filing cabinets.


In recognition of this problem, in 2016 a group of scientists developed the FAIR data principles to establish guidelines that support the reusability of scholarly data, making data findable, accessible, interoperable, and reusable. The catalyst for this initiative was driven by artificial intelligence’s insatiable appetite for data with minimal human intervention, or what the group called machine-actionability.


Clearly, data practices of the past were standing in the way of progress for the machine-to-machine future. So important is the need to create more accessible and reusable data, that at the 2016 G20 Hangzhou Summit, G20 leaders issued a statement endorsing the application of FAIR principles as a vital component of science and technology innovation-driven growth.


In a recent journal article in Data Intelligence, researchers set out how FAIR principles could be implemented to achieve machine-actionability. It noted under each principle that:

  • Findability: Data should be easy to find for both humans and computers, where machine-actionable metadata are essential for process automation.

  • Accessibility: Access control is essential for human and machine interaction with data, with strong governance.

  • Interoperability: Resources attributed to data should be structured so that machines can aggregate them into a richer, unified view of the data and its resources. To enable this, the semantics of each resource must be clearly defined.

  • Reusability: Descriptions of data through its metadata and resources are sufficient for humans and machines to decide whether the data is relevant for reuse, and if so, under what conditions should it be used.


This matters because we have an arms race of sorts, to integrate artificial intelligent systems into all parts of our lives. A prerequisite for this to happen is that data must be AI-ready.


The US government’s biomedical and health research organisation, the National Institutes of Health (NIH) are moving into AI in a big way, and have adopted FAIR principles to help guide their transition. One of the core themes of their plan, AI-ready data, means that the “creation and stewardship of data sets to enable machine learning may be the NIH’s single greatest lever to accelerate the application of AI within biomedicine.”


It is no surprise that organisations like the NIH are looking to leverage artificial intelligence to transform their operations. McKinsey & Company expect AI to add US$5.8 trillion in global value annually by 2035. They estimate that only 20% of AI-aware companies are using the technology in a core business process or at scale. Four out of the five challenges causing this low adoption relate to data that is not AI-ready.


It is for this reason, the FAIR principles guide the development of Eratos. In applying these to cyber-physical systems, there is a clear path toward unlocking the vast treasure trove of data that already exists about our natural and built worlds. With the right framework, this also provides a significant opportunity to new data streams, given they will plug into robust data infrastructure, leveraging integrated systems and automation, to continually build deep, rich, contextual data.


Eratos is taking the tracks laid by academia and building a bridge to apply it to similar problems in the commercial world. Using the FAIR data principles provides an elegant and considered solution, to address a broad and complex problem.



Share

Authors

Steve Taitoko

Cover Image

NASA


bottom of page