NotesFAQContact Us
Search Tips
Back to results
ERIC Number: ED551492
Record Type: Non-Journal
Publication Date: 2012
Pages: 251
Abstractor: As Provided
Reference Count: N/A
ISBN: 978-1-2677-9101-6
Tagline: Information Extraction for Semi-Structured Text Elements in Medical Progress Notes
Finch, Dezon Kile
ProQuest LLC, Ph.D. Dissertation, University of South Florida
Text analysis has become an important research activity in the Department of Veterans Affairs (VA). Statistical text mining and natural language processing have been shown to be very effective for extracting useful information from medical documents. However, neither of these techniques is effective at extracting the information stored in semi-structure text elements. A prototype system (TagLine) was developed as a method for extracting information from the semi-structured portions of text using machine learning. Features for the learning machine were suggested by prior work, as well as by examining the text, and selecting those attributes that help distinguish the various classes of text lines. The classes were derived empirically from the text and guided by an ontology developed by the Consortium for Health Informatics Research (CHIR), a nationwide research initiative focused on medical informatics. Decision trees and Levenshtein approximate string matching techniques were tested and compared on 5,055 unseen lines of text. The performance of the decision tree method was found to be superior to the fuzzy string match method on this task. Decision trees achieved an overall accuracy of 98.5 percent, while the string match method only achieved an accuracy of 87 percent. Overall, the results for line classification were very encouraging. The labels applied to the lines were used to evaluate TagLines' performance for identifying the semi-structures text elements, including tables, slots and fillers. Results for slots and fillers were impressive while the results for tables were also acceptable. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A