Skip to Main Content

Research Data Management

A practical guide on the best practices of research data and codes management

Sharing data FAIRly

Sharing data FAIR-ly


It is common to see that along with the growth of open science in the recent decades, the global research community is increasingly sharing or publishing research data, making it available to others. Many academic research funders and academic journals are also imposing data policies that require data accessibility. 

Widely adopted by the academic communities, the FAIR data principles first introduced in 2016 is the set of guidelines that help researchers make better use of, and engage with a broader audience with, their research data. Research data shared are required to be findable, accessible, interoperable, and reusable.  

When you are planning to share your research data, you are suggested to consider the below to maximize reusability of your data: 

  • Share your data and codes in open trusted repositories 

  • Get a persistent identifier (e.g. DOI) for your data and use it in the associated publication for others to cite 

  • Document your data, code, workflows, software required to open the data in separate file(s), and share alongside your data 

  • Select appropriate data format and tools for higher interoperability 

  • Use open license for your shared data and code 

Read our guide on Open Data for more details on the benefits of data sharing, the FAIR data principles, and selecting repositories for sharing.

Where to share research data

Locations for data sharing 

Research data can be shared in a variety of locations. Several researchers may opt to share directly via emails and personal websites or upload their data files directly on the journal publisher website in their journal article. They may seem convenient, yet it might not be the best option often due to limited accessibility, long-term preservation not guaranteed, versioning not supported, etc. 

In compliance with the FAIR data principles, a long-term repository that provides a permanent identifier is the most recommended option for data sharing. 

Selecting a repository

Selecting a repository 

When you are considering which data repository would best suits your data, consider the followings: 

  • Does your funding sponsor (or journal publisher) require or recommend a specific data repository? 

  • Is there a domain-specific repository that is widely-used in your research field? 

  • Does your organization/institution offer a data repository? 

  • Do you think the tools offered by the repository for data discovery and distribution are suitable for your data? 

  • Does the repository provide open data access and support the FAIR data principles (e.g. offer persistent identifiers like DOI, data licensing)? 

 

According to the Open Science Training Handbook, as recommended by OpenAIRE, researchers may consider the order of preference as follows: 

  1. Use a disciplinary repository established for your research domain with recognized standards in your discipline 

  1. Use an institutional research data repository 

  1. Use other general repositories that are designed to accommodate multi-disciplinary research data 

  1. Search for other data repositories in a global registry such as re3data or FAIRsharing

 

1. Disciplinary Repository 

An external disciplinary or data-type specific repository often follows discipline-specific metadata standards and data curation practices. Primary considerations should be given to these data repositories as they could better facilitate discoverability, understanding, and reusability of the shared datasets in a specific area of study. Researchers and research students are recommended to seek advice from their colleagues or supervisors for locating suitable disciplinary specific repositories in their subject fields. 

For instance, the Trans-NIH BioMedical Informatics Coordinating Committee (BMIC) maintains a list of National Institutes of Health (NIH) supported domain-specific repositories

Researchers can also identify appropriate discipline-specific repositories by referring to a data repository registry like re3data.org or exploring the catalogue of databases available in the FAIRsharing collection. 

 

2. HKU Institutional Data Repository: DataHub 

The University of Hong Kong has an institutional data repository, HKU DataHub, powered by Figshare. HKU researchers and students could cite, store, publish and share their research data and other digital materials on this repository. It serves as a persistent ‘home’ for research data generated by HKU community members, providing long-term storage, global open access, identifier generation, and secure protection, etc. 

Visit HKU DataHub to explore research data and other digital scholarly outputs published by HKU researchers and HKU DataHub: The Guide for guidelines on how to use the repository. 

To enhance the visibility of the research within HKU community, HKU researchers who deposit their data primarily at disciplinary repository may create an item record at DataHub with the link directing others to where the data is stored. Read the guide for uploading linked files to DataHub.

 

3. Generalist Repository 

Several generalist repositories offer cost-free services and accounts for researchers from multiple disciplines to deposit their research data. They usually accept all types of data and are most commonly used for data that cannot go into a domain- or discipline-specific repository.  

 

GREI logo

The Generalist Repository Ecosystem Initiative (GREI) 

The GREI is an initiative of the NIH to bring together seven generalist repositories in a collaborative working group. The repositories participated in the GREI are highlighted in the below table: 

 

figshare logo
Figshare is the general version of the HKU institutional data repository. It offers 20GB free storage for a cost-free private account. HKU researchers are recommended to use DataHub, which uses the same engine and interface, for more storage quota.
zenodo logo
Zenodo is run by the CERN data centre to support long-term preservation and open science movement in Europe. 
osf logo
OSF (Open Science Framework) is a free and open-source project management tool that supports researchers throughout their entire project lifecycle in open science best practices. 
harvard dataverse logo
Harvard Dataverse Repository is a free data repository open to all researchers from any discipline, both inside and outside of the Harvard community. 
dryad logo
Dryad is an open data publishing platform and community committed to the open availability and routine re-use of all research data.
mendeley data logo
Mendeley Data is a free and open generalist data repository to create, share, access and cite FAIR data globally, owned by Elsevier. 
vivli logo
Vivli is an independent, non-profit organization that has developed a global clinical research data sharing platform. The platform focuses on sharing individual participant-level data from completed clinical trials to serve the international research community. 

 

See also a comparison chart of the GREI-participated generalist repositories

Licensing Data

Licensing data 


Research data is intellectual property that could be under the ownership of researchers, the supporting institution, or the funder. HKU researchers may refer to the university’s policy on Intellectual Property Rights and the Research Data and Records Management for relevant regulations. 

When sharing your research data, using an open license is one of the recommended best practices to increase the reusability of your data. An open license specifies what can and cannot be done with an original work regardless of its form. It grants permissions and states restrictions. According to the definition by Opendefinition.org, an open license is one which grants permission to access, re-use and redistribute a work with few or no restrictions. 

Understanding the licensing terms of a dataset before reusing it is crucial to prevent copyright infringement and other intellectual property concerns.  

The most common open license used for academic work and datasets is the Creative Commons (CC) licenses. See the textbox below for more details on each CC license with different combinations, and more information on our guiding page

For open license frequently used specifically for open-source software or codes, read our guide on Open-Source Software and Codes

Creative Commons licenses

There are six Creative Commons license options. The Creative Commons license on a copyrighted work answers the question: What can a user do with this work?

Elements Copy and distribute
the material
Attribute the creator Distribute, remix, adapt,
and build upon
the material
Share the modified
material under
identical terms
Commercial use

CC BY

ccCC by

Allowed Required Allowed Not required Allowed

CC BY-SA

ccCC bycc sa

Allowed Required Allowed Required Allowed

CC BY-NC

ccCC bycc nc

Allowed Required Allowed Not required Prohibited

CC BY-NC-SA

ccCC bycc nccc sa

Allowed Required Allowed Required Prohibited

CC BY-ND

ccCC bycc nd

Allowed Required Prohibited (Modification prohibited) Allowed

CC BY-NC-ND

ccCC bycc nccc nd

Allowed Required Prohibited (Modification prohibited) Prohibited

CC0

cccc0

Allowed Not required Allowed Not required Allowed
Elements Copy and distribute
the material
Attribute the creator Distribute, remix, adapt,
and build upon
the material
Share the modified
material under
identical terms
Commercial use

 

Note:

Different publishers may have different license to publish (LTP) agreements.

As an author, when you choose a license, you will read through the license terms and consider which license suits you best.

For example, do you prefer CC-BY-NC-ND license, if you need to grant only to the journal publisher (but not to other users) the right to sell or rent your article?

 

Learn more and license chooser: 

Disclaimer: The information and materials provided on the website are for general informational purposes only and do not constitute legal advice.

Citing Data

Citing data


If you are utilizing third-party data published by others in your research or publications, you must provide a citation for the dataset to give credit where credit is due (original author/producer) and to help other researchers to locate the materials. 

Acknowledgements and citations contribute towards fostering a culture of sharing data without fear of ideas or recognition being stolen. Data citations also aid in the transparency of how data is being used. By citing data, original authors and new researchers can easily track how the data are being used to answer different questions. 

 

In general, a citation for dataset often includes the following components: 

  • Authors and their affiliated institutions/organizations 

  • Title 

  • Version 

  • DOI (or URL if a unique identifier is not available) 

  • Creation date 

  • Additional fields may also be specified or required by individual repository/journal 

 

The Australian Resource Data Commons (ARDC) provides standard data and software citation templates and examples: 

 

Standard data citation 

Template 
Creator (Publication Year): Title. Publisher. (resourceTypeGeneral). Identifier 

Example 
Hanigan, Ivan (2012): Monthly drought data for Australia 1890-2008 using the Hutchinson Drought Index. The Australian National University Australian Data Archive. (Dataset) http://doi.org/10.4225/13/50BBFD7E6727A

 

Standard software citation 

Template 
Creator (Publication Year): Title. Version No. Publisher. [resourceTypeGeneral]. Identifier. 

Example 
Xu, C., & Christoffersen, B. (2017). The Functionally-Assembled Terrestrial Ecosystem Simulator Version 1. Los Alamos National Laboratory (LANL), Los Alamos, NM (United States). [Software]. https://doi.org/10.11578/dc.20171025.1962

 

More resources: 

Digital Curation Centre, How to Cite Datasets and Link to Publications 

European Union, Data citation: A guide to best practice 

International Association for Social Science Information Services & Technology (IASSIST), Quick guide to data citation 

Open Science 101, Module 3: Open Data – Using Open Data