Digital Humanities
@ Pratt

Inquiries into culture, meaning, and human value meet emerging technologies and cutting-edge skills at Pratt Institute's School of Information

Basic OCR (optical character recognition)

The first step in many digital humanities projects is to digitize whatever corpus is being used. In order to use digitized text without having to manually transcribe it or laboriously cut and paste it, Optical Character Recognition (OCR) is often used. Using two scanned pages from Virginia Woolf’s novel The Waves, the following screencast offers some basic instruction on how to OCR scanned pages. A few choices of OCR software are also briefly evaluated.

A free, community version of a popular OCR software is tested as well as a free, web-based version and the quality of these is examined. Finally a trial version of a moderately expensive software is tested and evaluated. By the end of the screencast, viewers should have information that will give them a head start when it comes to undertaking a small scanning/OCR project.

LIS-657 Digital Humanities – DHskillshare post – OCR demo from Lauren Spiro on Vimeo.

The following two tabs change content below.


Latest posts by lauren_spiro (see all)