Skip to Main Content

Digital Humanities: Data for Digital Humanities

An introduction to digital humanities

Digital-born data

Born-digital records are records that have been natively created in digital format.

  • email
  • text-based documents (for example Word documents, Google documents)
  • presentations (for example PowerPoint)
  • spreadsheets (for example Excel)
  • PDFs
  • images and videos
  • CAD drawings
  • 3D models
  • data sets and databases
     

Digitization

According to Gartner’s IT Glossary, “Digitization is the process of changing from analog to digital form”

Creating and transforming your data into digital form is the first step for further processing. Digitized data or digital-born data can become reusable and shareable through data management and long-term preservation. Common data formats include textual data, visual images, audio & 3D models from the following transformation methods:

  • Image scanning
  • OCR
  • Sound Digitization
  • 3D scanning

See also:

https://www.clir.org/pubs/reports/pub107/bellinger/

https://www.oclc.org/research/areas/research-collections/borndigital.html

Data mining & Knowledge discovery

”Knowledge discovery in databases is a process that is defined by several processing
steps that have to be applied to a data set of interest in order to extract useful patterns.” (Hotho, Nürnberger, & Paaß, 2005)

The CRoss Industry Standard Process for Data Mining (Crisp DM1) model defined the main steps as:

  1. business understanding,
  2. data understanding,
  3. data preparation,
  4. modeling,
  5. evaluation,
  6. deployment.

Reference:

Hotho, A., Nürnberger, A., & Paaß, G. (2005, May). A brief survey of text mining. In Ldv Forum (Vol. 20, No. 1, pp. 19-62).

Digital Scholarship Librarian

Profile Photo
Terry Chung
Contact:
5/F, Main Library, University of Hong Kong, Pokfulam Road,
Hong Kong
(852) 2859-7002