Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Digital Humanities: Data for Digital Humanities

An introduction to digital humanities

Digital-born data

Born-digital records are records that have been natively created in digital format.

  • email
  • text-based documents (for example Word documents, Google documents)
  • presentations (for example PowerPoint)
  • spreadsheets (for example Excel)
  • PDFs
  • images and videos
  • CAD drawings
  • 3D models
  • data sets and databases


According to Gartner’s IT Glossary, “Digitization is the process of changing from analog to digital form”

Creating and transforming your data into digital form is the first step for further processing. Digitized data or digital-born data can become reusable and shareable through data management and long-term preservation. Common data formats include textual data, visual images, audio & 3D models from the following transformation methods:

  • Image scanning
  • OCR
  • Sound Digitization
  • 3D scanning

See also:

Data mining & Knowledge discovery

”Knowledge discovery in databases is a process that is defined by several processing
steps that have to be applied to a data set of interest in order to extract useful patterns.” (Hotho, Nürnberger, & Paaß, 2005)

The CRoss Industry Standard Process for Data Mining (Crisp DM1) model defined the main steps as:

  1. business understanding,
  2. data understanding,
  3. data preparation,
  4. modeling,
  5. evaluation,
  6. deployment.


Hotho, A., Nürnberger, A., & Paaß, G. (2005, May). A brief survey of text mining. In Ldv Forum (Vol. 20, No. 1, pp. 19-62).

Digital Scholarship Librarian

Profile Photo
Terry Chung
5/F, Main Library, University of Hong Kong, Pokfulam Road,
Hong Kong
(852) 2859-7002