Skip to Main Content

Open Science

What is Open Data

osf open science badge for open data

Image source: Open Science Badge from the Centre for Open Science

 

What is Open Data? 

Before getting into the concept of Open Data, it is important to understand what “data” refers to. Data are any type of information that has been collected, observed, generated, or created to validate original research findings.

A detailed list of the different types of research data is available on our Research Data Management guide. 

 

Defined by the Open Knowledge Foundation in the Open Data Handbook, Open data is... 

“ data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike. ” 

 

In this context, open data should also be: 

  • Sufficiently described and documented with appropriate metadata so that they can be easily understood and reused 

  • Accessible with appropriate license, copyright, and citation information 

  • Findable in an accredited or trustworthy resource, accompanied with history of changes and versioning 

Benefits of Open Data

Benefits of Open Data 

Open data can be of value to multiple stakeholders in many areas. Individual researchers could be benefited from opening data as follows: 

  • Secure your research data in publicly accessible repositories as long as they are of continuing value 

  • Get higher visibility of the research findings associated with the shared data 

  • Have more chances to be cited and get credits from it. Study found that journal articles with statements linking to data in a repository receive 25% higher citations on average (Colavizza et al., 2020) 

  • Comply with the increasing journal requirements and/or funder requirements on research data sharing upon journal article publication and/or receipt of awarded funding 

 

Open data could also benefit the entire scientific community, such as: 

  • Accelerate scientific discovery by providing greater access to research data that facilitate replications 

  • Improve integrity of scientific research and scholarly records and reduce academic fraud 

  • Enhance global emergency response to intercontinental crisis, e.g. the COVID-19 pandemic 

  • Foster a global research culture of transparency and validation, making science inclusive and diverse with equitable sharing of knowledge  

The FAIR data principles

image of FAIR data principles

The FAIR data principles 

The FAIR data principles is a set of guidelines that help researchers make better use of, and engage with a broader audience with, their research data. First introduced in 2016 (Wilkinson et al.), the principles aim to improve discoverability, accessibility and reusability of the data to be shared openly, making them more valuable and maximizing their use, re-use and impact. The principles have since been widely accepted and adopted by academic communities and research institutions worldwide. 

The FAIR data principles specify shared data must be: Findable, Accessible, Interoperable, and Reusable.  

Preparing data for sharing

Preparing data for sharing 

Properly preparing your research data for sharing is crucial to ensure its reusability, especially when they are planned to be shared or published openly.  

Implementing good practices in research data management across the research cycle could better help you in making open data, covering aspects such as data management planning, data documentation, organization, formatting, anonymization, and licensing, etc. 

Read more best practices of research data management on the Research Data Management Guide.

Where to share research data

Locations for data sharing 

Research data can be shared in a variety of locations. Several researchers may opt to share directly via emails and personal websites or upload their data files directly on the journal publisher website in their journal article. They may seem convenient, yet it might not be the best option often due to limited accessibility, long-term preservation not guaranteed, versioning not supported, etc. 

In compliance with the FAIR data principles, a long-term repository that provides a permanent identifier is the most recommended option for data sharing. 

Selecting a repository

Selecting a repository 

When you are considering which data repository would best suits your data, consider the followings: 

  • Does your funding sponsor (or journal publisher) require or recommend a specific data repository? 

  • Is there a domain-specific repository that is widely-used in your research field? 

  • Does your organization/institution offer a data repository? 

  • Do you think the tools offered by the repository for data discovery and distribution are suitable for your data? 

  • Does the repository provide open data access and support the FAIR data principles (e.g. offer persistent identifiers like DOI, data licensing)? 

 

According to the Open Science Training Handbook, as recommended by OpenAIRE, researchers may consider the order of preference as follows: 

  1. Use a disciplinary repository established for your research domain with recognized standards in your discipline 

  1. Use an institutional research data repository 

  1. Use other general repositories that are designed to accommodate multi-disciplinary research data 

  1. Search for other data repositories in a global registry such as re3data or FAIRsharing

 

1. Disciplinary Repository 

An external disciplinary or data-type specific repository often follows discipline-specific metadata standards and data curation practices. Primary considerations should be given to these data repositories as they could better facilitate discoverability, understanding, and reusability of the shared datasets in a specific area of study. Researchers and research students are recommended to seek advice from their colleagues or supervisors for locating suitable disciplinary specific repositories in their subject fields. 

For instance, the Trans-NIH BioMedical Informatics Coordinating Committee (BMIC) maintains a list of National Institutes of Health (NIH) supported domain-specific repositories

Researchers can also identify appropriate discipline-specific repositories by referring to a data repository registry like re3data.org or exploring the catalogue of databases available in the FAIRsharing collection. 

 

2. HKU Institutional Data Repository: DataHub 

The University of Hong Kong has an institutional data repository, HKU DataHub, powered by Figshare. HKU researchers and students could cite, store, publish and share their research data and other digital materials on this repository. It serves as a persistent ‘home’ for research data generated by HKU community members, providing long-term storage, global open access, identifier generation, and secure protection, etc. 

Visit HKU DataHub to explore research data and other digital scholarly outputs published by HKU researchers and HKU DataHub: The Guide for guidelines on how to use the repository. 

 

3. Generalist Repository 

Several generalist repositories offer cost-free services and accounts for researchers from multiple disciplines to deposit their research data. They usually accept all types of data and are most commonly used for data that cannot go into a domain- or discipline-specific repository.  

 

GREI logo

The Generalist Repository Ecosystem Initiative (GREI) 

The GREI is an initiative of the NIH to bring together seven generalist repositories in a collaborative working group. The repositories participated in the GREI are highlighted in the below table: 

 

figshare logo
Figshare is the general version of the HKU institutional data repository. It offers 20GB free storage for a cost-free private account. HKU researchers are recommended to use DataHub, which uses the same engine and interface, for more storage quota.
zenodo logo
Zenodo is run by the CERN data centre to support long-term preservation and open science movement in Europe. 
osf logo
OSF (Open Science Framework) is a free and open-source project management tool that supports researchers throughout their entire project lifecycle in open science best practices. 
harvard dataverse logo
Harvard Dataverse Repository is a free data repository open to all researchers from any discipline, both inside and outside of the Harvard community. 
dryad logo
Dryad is an open data publishing platform and community committed to the open availability and routine re-use of all research data.
mendeley data logo
Mendeley Data is a free and open generalist data repository to create, share, access and cite FAIR data globally, owned by Elsevier. 
vivli logo
Vivli is an independent, non-profit organization that has developed a global clinical research data sharing platform. The platform focuses on sharing individual participant-level data from completed clinical trials to serve the international research community. 

 

See also a comparison chart of the GREI-participated generalist repositories

Reference

Colavizza G, Hrynaszkiewicz I, Staden I, Whitaker K, McGillivray B (2020) The citation advantage of linking publications to research data. PLoS ONE 15(4): e0230416. https://doi.org/10.1371/journal.pone.0230416  

FASEB. (2024, July 29). Choosing Your Generalist Repository. https://dataworks.faseb.org/helpdesk/kb/choosing-a-generalist-repository  

OpenAIRE. (2024, July 29). Guides for Researchers: How to find a trustworthy repository for your data. https://www.openaire.eu/find-trustworthy-data-repository  

Open Knowledge Foundation. (2024). Open Data Handbook. What is Open Data? https://opendatahandbook.org/guide/en/what-is-open-data/  

NASA. (2024). Open Science 101. https://nasa.github.io/Transform-to-Open-Science/os101-modules/ 

Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018. https://doi.org/10.1038/sdata.2016.18