NotesFAQContact Us
Search Tips
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: EJ1059695
Record Type: Journal
Publication Date: 2015-Jan
Pages: 33
Abstractor: As Provided
Reference Count: N/A
ISSN: ISSN-1531-2542
Loose, Falling Characters and Sentences: The Persistence of the OCR Problem in Digital Repository E-Books
Kichuk, Diana
portal: Libraries and the Academy, v15 n1 p59-91 Jan 2015
The electronic conversion of scanned image files to readable text using optical character recognition (OCR) software and the subsequent migration of raw OCR text to e-book text file formats are key remediation or media conversion technologies used in digital repository e-book production. Despite real progress, the OCR problem of reliability and accuracy in OCR-derived e-book text and metadata persists. This paper examines a selection of digitized e-books in several prominent digital repositories and discusses the impact of OCR technology on e-book text file formats, metadata, and the online reading experience.
Johns Hopkins University Press. 2715 North Charles Street, Baltimore, MD 21218. Tel: 800-548-1784; Tel: 410-516-6987; Fax: 410-516-6968; e-mail:; Web site:
Publication Type: Journal Articles; Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A