Scientific research in modern days often involves data generation, processing, analysis and visualization using codes and software. Through sharing the software and code that were created or developed during the research process, researchers play an integral role in fostering a more open, transparent, and reproducible approach to research. First, let us look at the definitions of the terms (NASA, 2024):
Code / Source code |
Source codes are human-readable set of statements written in a programming language that together compose software. |
Software |
Software refers to computer programs and applications that provide users some degree of utility or produce a result or service. Software can be distributed in executable form, as source code, or as a service via the internet. It is often a collection of programs, data, and other information that a computer system uses to perform specific tasks. |
Open-Source Software refers to software whose source code is under an open source license, by which the copyright holder grants to anyone the rights to inspect, modify, and distribute the source. Open-source software is transparently shared in a public repository, and sometimes maintained through collaboration.
According to the Open Source Initiative, Open-Source Software does not directly equal to open access to the source code only, but consists of a list of criteria in its distribution terms, including but not limited to free redistribution, allowing modifications and derived works, distribution of license, being inclusive, independent from any specific product, etc.
Source Code Management (SCM) or version control system is a tool to automatically manage changing versions of a computer file, especially one that contains source code. In software development, version control preserves a complete history of changes to the source code and enables a developer to roll back to an earlier version and merge with other revisions.
This video gives a brief introduction on how open-source software projects went mainstream and contributed to modern days technological development model.
The Rise of Open-Source Software by CNBC
The practice of writing computer code can be commonly seen across various disciplines, from the physical and life sciences to the social sciences, arts and humanities. Researchers nowadays may rely on programming for tasks such as statistical analysis, data visualization, and computational simulation, etc. These codes may be integral to research findings, and they are as important as the raw and processed data created in the studies. Hence, codes should also be preserved and made openly accessible where possible, enabling others to replicate and verify results.
For students, developers or researchers who would like to get involved in creation or modification of open-source software, the Open Source Guides created and curated by GitHub provides a collection of resources for beginners.
Open-source software licenses are the basis for how scientists use, make, and share code and software. They grant permissions for anyone to inspect, use, modify, and distribute the licensed software’s source code for any purpose. They can broadly be categorized into permissive licenses and copyleft licenses.
Permissive licenses | Permissive licenses, such as the MIT and Apache 2.0 licenses, allow users to do almost anything with the software, including freedom to use, modify, redistribute, create derivative works, and incorporate the source codes into proprietary closed source software. |
Copyleft licenses | Copyleft licenses, like the General Public License (GPL), allow users to modify and distribute the software, but with the condition that any derivative works must also be open source using the same license. It is intended to guarantee users’ freedom to share and change all versions of a program – to make sure it remains free software for all its users. |
More resources on open-source licenses for software:
List of Open Source Initiative (OSI) Approved Licenses
Open Source Initiative’s FAQ on Which Open Source license should I choose to release my software under?
Choose an open source license by GitHub
A source code repository is a storage location where developers keep and manage their codes. These repositories can host a variety of assets, including code files, documentation, and other resources related to a software project. They serve as a centralized hub where all project-related materials can be accessed, tracked, and managed. Popular platforms for hosting source code repositories include GitHub, GitLab, and Bitbucket.
GitHub is a Git-based platform that allows collaboration and code history tracking. It is one of the most widely used code repositories. GitHub has an integration with the HKU data repository, DataHub powered by Figshare, allowing HKU researchers to easily deposit and archive a copy of your source codes or software to DataHub, which can mint a Digital Object Identifier (DOI) for citation. More details are available on this help guide. |
|
GitLab is a Git-based platform that also offers DevOps and Continuous Integration (CI) / Continuous Deployment (CD) functionalities. | |
BitBucket is a platform that can host Git and Mercurial repositories. Owned by Atlassian. |
NASA. (2024). Open Science 101. https://nasa.github.io/Transform-to-Open-Science/os101-modules/
Open Source Initiative. (2024, Feb 16). The Open Source Definition. https://opensource.org/osd