The first step in many digital humanities projects is to digitize whatever corpus is being used. In order to use digitized text without having to manually transcribe it or laboriously cut and paste it, Optical Character Recognition (OCR) is often used. Using two scanned pages from Virginia Woolf’s novel The Waves, the following screencast offers some basic instruction on how to OCR scanned pages. A few choices of OCR software are also briefly evaluated.
A free, community version of a popular OCR software is tested as well as a free, web-based version and the quality of these is examined. Finally a trial version of a moderately expensive software is tested and evaluated. By the end of the screencast, viewers should have information that will give them a head start when it comes to undertaking a small scanning/OCR project.