Skip to Main Content

DataHub: RPg Students

RPg Students

Submission Guide

Please allow sufficient time for data review and curation by the libraries after your dataset submission. Once your data have been submitted, the Libraries will contact you within 5 working days for confirmation or further amendment(s).


Step 1:

Before you deposit and submit your dataset, please review the followings:

  1. Depositor’s Agreement

  2. What to Deposit

  3. When to Redact and Anonymize

  4. Open Access

  5. What Can I Upload


Step 2:

Review the “Restricted Access Procedures for RPg students” page for deciding whether any restricted access options should be applied to your data files. Please consult your supervisor on the suitability of your chosen access right control option before the submission.

If it is necessary to upload a metadata record only (with valid justification and supervisor's approval), please skip the data files uploading process and proceed with filling in the metadata directly. Then forward your supervisor's approval to


Step 3:

  • Prepare a README file for submission by using the README file template.

  • Organize your data files into one main folder. Inside the main folder, you may create sub-folders to classify your files. Alternatively, you may also classify your data files into multiple sub-folders. Please refer to this record for example.

  • Separate your README file from the main folder(s) as shown below:

*Recommended Practice*

If your dataset folder is larger than 5GB, you are recommended to 1) upload it via the FTP Uploader*. Alternatively, you are also suggested to 2) divide it into multiple folders with a maximum of 5GB per each folder for smooth uploading experience via the web browser interface. 

If the main folder is smaller than 5GB: If the all data files are larger than 5GB in total, separate them into multiple folders:


*Notes on using FTP Uploader:

**You are recommended to break down your datasets into multiple zipped folders with a smaller folder size per each if you have a very large volume of data files.

**Only zipped folder (.zip) can be uploaded via FTP Uploader. Uploading unzipped folders will fail to transfer the files to DataHub.


Special Condition: Dataset Published on External Repository

If you have already deposited your dataset (raw data or processed data) in an external repository such as subject-specific repositories of your field of study e.g. SRA, ENA etc., you may upload a README file and provide the URL(s) or DOI link(s) of the deposited record without uploading the same data files onto DataHub.

Please include the link(s) under the field “Related Materials” in your dataset submission on DataHub. Please refer to the uploading data page (Point 10) for how to add the entries under the "Related Materials" field.


Step 4:

To submit the dataset for examination, please access DataHub. Login with your HKU Portal ID. 


If you have encountered redirection issues after login, especially your HKU Portal login credentials have been saved on the browser, you are suggested to try the following alternatives for logging into DataHub:

  1. Clear caches, browsing history, and cookies on the browser you are currently using, and then re-open the browser and try to login to DataHub again.
  2. Try to login to DataHub via Chrome (Incognito) / Firefox (Private) mode.
  3. Try to login to DataHub via other web browsers other than the one you are using, such as Chrome, Firefox, Edge, or Safari.


After the HKUL Authentication, on the DataHub interface, follow the below steps:

  • Under “My data”, click on “Create a new item record

  • Upload your README file and the data folder(s) by dragging them into the record or selecting files by clicking on “Browse files”, or folders by “Browse for folders”.

  • Separate the README file from the main folder(s) under the same item record as shown below:


Note 1: Click on "Manage files" to view the uploading status of your files. A "tick" icon refers to "uploading completed", and an orange file icon refers to "uploading successful but file cannot be scanned". 


Note 2: If your data file / folder fails to be uploaded, the record will be highlighted in red with a message "Something went wrong" shown beside the file. Please remove the record and re-upload the file / folder again.


  • Assign the title to your dataset record in this recommended format: Supporting data for “title of your thesis”.

  • Briefly describe your data files and introduce what they are about generally with full sentences in “Descriptions”. It is suggested to be as descriptive as possible yet avoid inclusion of sensitive or confidential contents if any.

  • Fill out all the mandatory metadata fields by following the uploading via DataHub Interface guidelines. Be careful that “Resource Title and DOI” is optional, leave them blank if you do not have any external resources (e.g. articles or peer-reviewed publications) to be linked with the record.

  • Apply embargo period and restricted access if necessary. By default, the dataset will be set as Open Access under the Creative Commons (CC-BY-NC) license.


Step 5:

When you are depositing your data, Reserve a DOI and Share with private link are mandatory. Please refer to the related guides on separate pages:

Reserve a DOI: Click on the tab "Reserving DOI" on the "Publishing Data" page.

Share with private link: Click on the tab "Share Files" on the "Sharing Data" page.


Before submitting your data, make sure you have successfully reserved a DOI and generated a private link for your item record. They are necessary for you to proceed with the remaining dataset submission process.


Step 6:

Press the “Submit for review” button in order to submit your dataset.


Step 7:

Your submitted data will be sent to the data curator(s) in the Libraries for review. You will receive an email confirmation from the Libraries when the submitted data are properly curated. If any amendment(s) are required, the Libraries will contact you within 5 working days and discuss with you directly. Review process may take a longer time if the dataset submission is not up to the required standards. Please allow sufficient time for data curation by the libraries. Do not proceed if you haven’t received any reply from the Libraries.


Step 8:

Only after you have received the confirmation from the Libraries via email, fill in and submit the Dataset Submission Form. The link for the form will be sent to you by the Libraries via email. Please refer to the guidelines by clicking on the tab "Dataset Submission Form" on this page for a step-by-step guide on how to fill in the form. After successful submission, you may print out a hardcopy or save an electronic copy of the completed form for submission to your department or faculty as a proof of submission.


Step 9:

Your primary supervisor will receive an email notification requesting for his/her review on your submitted dataset upon submission of the above online form. If comments are received from your supervisor, the Libraries will contact you directly for amendments. If no comments were received or upon completion of the final amendments, your dataset record will be released on DataHub and the whole dataset submission process is completed.

Beginning with the September 2017 intake, all HKU Research Postgraduate (RPg) students have responsibility for

  1. using a data management plan (DMP), where applicable, to describe the use of data in preparation for, or in the generation of their theses, and
  2. depositing, where applicable, a dataset in the HKU DataHub. "RPg" includes the degrees of MPhil, PhD, and SJD.

The Graduate School Handbook describes these regulations. Sections XX and XXI of the Handbook give the Procedures for MPhil and PhD, respectively. In these Procedures, the relevant paragraphs for data are,

  • MPH5 & PHD5 Probation and Confirmation of Candidature – for description of a data management plan (DMP)
  • MPH7 & PHD7 Period of Study – for describing when in the period of study, a dataset, where applicable, is to be submitted
  • MPH14 & PHD14 Submission of Thesis for Examination – for description of dataset submission
  • MPH15 & PHD15 Thesis Examination – for consideration of DMP Entry results and dataset if applicable, and if desired by the examiners

The 2015 Policy on Research Data and Records Management asks that all researchers, including RPg students, properly and ethically describe in a Data Management Plan at the beginning of their project, how they will collect, organize, store, and finally deposit a dataset (where applicable) at the end of their project. The HKU Libraries provide Research Data Services to enable this process.

In order to facilitate high quality of research integrity and data curation process for the research outputs from our Research Postgraduate (RPg) Students, additional procedures will be applied when RPg students are submitting their datasets. The specific requirements are as follows:



The emphasis of the HKU RDM initiative is on "research integrity". Research results claimed in publications must be reproducible. Replication datasets must be preserved to enable this later reproducibility. All data, scripts, questionnaires, codebooks etc. necessary for a third party to arrive at the same research results claimed must be preserved.

As part of the data deposit, please indicate which datafiles are raw data (i.e. data that indicate the original data collection process such as questionnaires) and which are processed data (i.e. data ready for analysis in publications) – both are needed eventually, but raw data files are essential for any completion report.

Raw data may contain personal identifiers, and therefore must be stored under "Restricted Access". If the data contains sensitive, confidential or restricted data per the HKU Policy on Research Ethics, the researcher may, at his or her choice, wish to further make a version that anonymizes the data, for public access (with the approval of relevant IRBs or ethics committees), to show in open access.

If data includes personal data,the data should be put under confidential,

  • Personal data from clinical research (i.e. Institutional Review Board (IRB) approved)
    • provide approval code, consent forms, ethical application form when available, please state the risk of re-identification from the different datafiles and how the risk has been minimised for any dataset intended for sharing.
  • Personal data from non-clinical research (i.e. Human Research Ethics Committee (HREC) approved)
    • provide approval code, consent forms, ethical application form, please state the risk of re-identification from the different datafiles and how the risk has been minimised for any dataset intended for sharing.

If data includes interviews,

  • Interview transcripts
  • Blank questionnaire & interviewer guidelines

If field research data,

  • provide copy of file research notebook in digital format, preferably machine readable.

If lab research data,

  • copy of working papers and/or lab research notebooks in digital format, preferably machine readable.

If simulated data,

  • how was it generated? Please either explain or provide a link.

If other types of data, such as Image or video data, Creative or Design data,

  • please explain what type of data and how was it collected/generated.

If software is needed to read or analyze any of the datafiles,

  • please provide full details of software name, version needed, and any instructions necessary to obtain the software. If you have written your own script for analyzing the data, please include this script also in final deposit.

When you are uploading your files onto DataHub, you are also required to prepare and submit a README file alongside with your dataset.

The README file aims to provide your supervisor and data curators an overview of your data, describing how you organize your data and how the files are named, relationships between files, details of your data, and methodological information, etc. A clear and tidy file structure would allow viewers to better understand your findings and locate the resources they need easier, which would eventually avoid prolonged data deposit processing time for your research outputs.

Your README file should consist of four main components:

  1. General Information

  2. Data and File Overview

  3. Data Description For Each Files

  4. Methodological Information (if applicable)

For specific requirements under each section, please refer to the README file template below:


2. Data Curation

When all necessary files are submitted successfully on DataHub, the University of Hong Kong Libraries will start the data curation process with a series of checking and digital preservation activities. If the descriptions or information stated in the README file are found ambiguous, the Libraries may contact the submitter to provide more information in order to ensure that datasets have been duly submitted.


3. Supervisor Review

Your submitted dataset and README file will be sent to your primary supervisor for review. This is to ensure the appropriate dataset is duly submitted. 


Please refer to the step-by-step guide on the submission procedures by moving to the next tab.

Dataset Submission Form

Reminder: Do NOT submit this form until you have received email confirmation from the Libraries.

1. Go to the Online Dataset Submission Form at <>

2. Login with your HKU Portal ID.

3. Your details will be automatically filled after you have logged in.

4. Enabling dataset destruction is optional. If necessary, tick the “Enable Dataset Destruction” box and enter the date of destruction. Please strictly follow the University requirement on minimum retention period when you are making the request.

5. Enter the DOI that has been reserved for your dataset. Press on the “Resolve” button. The details of your dataset will appear automatically. Double check if it is the correct record that you are going to submit.


6. Click on “Submit”. An email notification will be automatically sent to your primary supervisor for his/her review.

7. If no further comments from your supervisor is received, print out and submit a hardcopy of the Dataset Submission Form to your department or faculty for record.

Restricted Access Procedures for RPg Students

There are occasions that you may wish to upload your data with access control conditions, especially for the sensitive data that conveys personal identifiers. The below will guide you through the steps required for setting up access restrictions to your files.

If your data contain sensitive, confidential or restricted data per the HKU Policy on Research Ethics, you are required to handle those data by means of either of the below two methods:

  1. Upload the data under restricted access

  2. Make and upload a version that anonymizes the data, for public access (with the approval of relevant IRBs or ethics committees)

You are recommended to consult your supervisor on the suitability of your chosen access right control option before the submission. If uploading a metadata record only is necessary, please refer to the “Metadata Records Only” page under Publishing Data for the procedures.


For uploading your data files under restricted access, which only allows you and your supervisor to access those files, please refer to the below procedures:

Step 1: Click on “Add embargo and restricted access” on the item record editing page.


Step 2: Leave the option “Nobody” as your currently selected option.


Step 3: Click on “On files only” from the dropdown menu under Embargo type.


Step 4:  Decide how long your data files will be restricted for access.

  • For fully confidential items, select “Permanent Embargo” under Embargo period.
  • For applying an embargo with a fixed period of time, select the appropriate length of time or select a specific date.


Note: For RPg student research dataset submission, dataset under this setting will be accessible only to the student, and his/her supervisor(s). The examiner(s) of thesis may also request access to the data for thesis examination process (MPH15 & PHD15 of the Procedures). When the student and supervisor(s) leave the university, the Dean of the faculty will be granted access permission.

Dataset Submission Procedures for RPg Students (Part 1)

Dataset Submission Procedures for RPg Students (Part 2)

RDM Procedures for RPG Students: Dataset Submission