Image source: Open Science Badge from the Centre for Open Science
Before getting into the concept of Open Data, it is important to understand what “data” refers to. Data are any type of information that has been collected, observed, generated, or created to validate original research findings.
A detailed list of the different types of research data is available on our Research Data Management guide.
Defined by the Open Knowledge Foundation in the Open Data Handbook, Open data is...
“ data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike. ”
In this context, open data should also be:
Sufficiently described and documented with appropriate metadata so that they can be easily understood and reused
Accessible with appropriate license, copyright, and citation information
Findable in an accredited or trustworthy resource, accompanied with history of changes and versioning
Open data can be of value to multiple stakeholders in many areas. Individual researchers could be benefited from opening data as follows:
Secure your research data in publicly accessible repositories as long as they are of continuing value
Get higher visibility of the research findings associated with the shared data
Have more chances to be cited and get credits from it. Study found that journal articles with statements linking to data in a repository receive 25% higher citations on average (Colavizza et al., 2020)
Comply with the increasing journal requirements and/or funder requirements on research data sharing upon journal article publication and/or receipt of awarded funding
Open data could also benefit the entire scientific community, such as:
Accelerate scientific discovery by providing greater access to research data that facilitate replications
Improve integrity of scientific research and scholarly records and reduce academic fraud
Enhance global emergency response to intercontinental crisis, e.g. the COVID-19 pandemic
Foster a global research culture of transparency and validation, making science inclusive and diverse with equitable sharing of knowledge
The FAIR data principles is a set of guidelines that help researchers make better use of, and engage with a broader audience with, their research data. First introduced in 2016 (Wilkinson et al.), the principles aim to improve discoverability, accessibility and reusability of the data to be shared openly, making them more valuable and maximizing their use, re-use and impact. The principles have since been widely accepted and adopted by academic communities and research institutions worldwide.
The FAIR data principles specify shared data must be: Findable, Accessible, Interoperable, and Reusable.
F1. (Meta)data are assigned a globally unique and persistent identifier
F2. Data are described with rich metadata (defined by R1 below)
F3. Metadata clearly and explicitly include the identifier of the data they describe
F4. (Meta)data are registered or indexed in a searchable resource
A1. (Meta)data are retrievable by their identifier using a standardised communications protocol
A1.1 The protocol is open, free, and universally implementable
A1.2 The protocol allows for an authentication and authorisation procedure, where necessary
A2. Metadata are accessible, even when the data are no longer available
I2. (Meta)data use vocabularies that follow FAIR principles
I3. (Meta)data include qualified references to other (meta)data
Properly preparing your research data for sharing is crucial to ensure its reusability, especially when they are planned to be shared or published openly.
Implementing good practices in research data management across the research cycle could better help you in making open data, covering aspects such as data management planning, data documentation, organization, formatting, anonymization, and licensing, etc.
Read more best practices of research data management on the Research Data Management Guide.
Research data can be shared in a variety of locations. Several researchers may opt to share directly via emails and personal websites or upload their data files directly on the journal publisher website in their journal article. They may seem convenient, yet it might not be the best option often due to limited accessibility, long-term preservation not guaranteed, versioning not supported, etc.
In compliance with the FAIR data principles, a long-term repository that provides a permanent identifier is the most recommended option for data sharing.
When you are considering which data repository would best suits your data, consider the followings:
Does your funding sponsor (or journal publisher) require or recommend a specific data repository?
Is there a domain-specific repository that is widely-used in your research field?
Does your organization/institution offer a data repository?
Do you think the tools offered by the repository for data discovery and distribution are suitable for your data?
Does the repository provide open data access and support the FAIR data principles (e.g. offer persistent identifiers like DOI, data licensing)?
According to the Open Science Training Handbook, as recommended by OpenAIRE, researchers may consider the order of preference as follows:
Use a disciplinary repository established for your research domain with recognized standards in your discipline
Use an institutional research data repository
Use other general repositories that are designed to accommodate multi-disciplinary research data
Search for other data repositories in a global registry such as re3data or FAIRsharing.
An external disciplinary or data-type specific repository often follows discipline-specific metadata standards and data curation practices. Primary considerations should be given to these data repositories as they could better facilitate discoverability, understanding, and reusability of the shared datasets in a specific area of study. Researchers and research students are recommended to seek advice from their colleagues or supervisors for locating suitable disciplinary specific repositories in their subject fields.
For instance, the Trans-NIH BioMedical Informatics Coordinating Committee (BMIC) maintains a list of National Institutes of Health (NIH) supported domain-specific repositories.
Researchers can also identify appropriate discipline-specific repositories by referring to a data repository registry like re3data.org or exploring the catalogue of databases available in the FAIRsharing collection.
The University of Hong Kong has an institutional data repository, HKU DataHub, powered by Figshare. HKU researchers and students could cite, store, publish and share their research data and other digital materials on this repository. It serves as a persistent ‘home’ for research data generated by HKU community members, providing long-term storage, global open access, identifier generation, and secure protection, etc.
Visit HKU DataHub to explore research data and other digital scholarly outputs published by HKU researchers and HKU DataHub: The Guide for guidelines on how to use the repository.
Several generalist repositories offer cost-free services and accounts for researchers from multiple disciplines to deposit their research data. They usually accept all types of data and are most commonly used for data that cannot go into a domain- or discipline-specific repository.
The Generalist Repository Ecosystem Initiative (GREI)
The GREI is an initiative of the NIH to bring together seven generalist repositories in a collaborative working group. The repositories participated in the GREI are highlighted in the below table:
See also a comparison chart of the GREI-participated generalist repositories.
Colavizza G, Hrynaszkiewicz I, Staden I, Whitaker K, McGillivray B (2020) The citation advantage of linking publications to research data. PLoS ONE 15(4): e0230416. https://doi.org/10.1371/journal.pone.0230416
FASEB. (2024, July 29). Choosing Your Generalist Repository. https://dataworks.faseb.org/helpdesk/kb/choosing-a-generalist-repository
OpenAIRE. (2024, July 29). Guides for Researchers: How to find a trustworthy repository for your data. https://www.openaire.eu/find-trustworthy-data-repository
Open Knowledge Foundation. (2024). Open Data Handbook. What is Open Data? https://opendatahandbook.org/guide/en/what-is-open-data/
NASA. (2024). Open Science 101. https://nasa.github.io/Transform-to-Open-Science/os101-modules/
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018. https://doi.org/10.1038/sdata.2016.18