NotesFAQContact Us
Collection
Advanced
Search Tips
ERIC Number: ED525100
Record Type: Non-Journal
Publication Date: 2011-Sep
Pages: 22
Abstractor: As Provided
Reference Count: 20
ISBN: N/A
ISSN: N/A
A New Framework for Textual Information Mining over Parse Trees. CRESST Report 805
Mousavi, Hamid; Kerr, Deirdre; Iseli, Markus R.
National Center for Research on Evaluation, Standards, and Student Testing (CRESST)
Textual information mining is a challenging problem that has resulted in the creation of many different rule-based linguistic query languages. However, these languages generally are not optimized for the purpose of text mining. In other words, they usually consider queries as individuals and only return raw results for each query. Moreover they cannot effectively express ambiguities, cannot adapt to different domains, require a large number of rules in order to accurately extract information, and are not very user-friendly. This paper introduces a new text mining framework using a tree-based Linguistic Query Language, called LQL. The framework generates more than one parse tree for each sentence using a probabilistic parser, and annotates each node of these parse trees with main-parts information which is set of key terms from the node's branch based on the linguistic structure of the branch. The main-parts can be specialized for different domains based on a user-generated list of concepts. Using main-parts-annotated parse trees for a given textual dataset, the system can efficiently answer individual queries as well as mine the text for a given set of queries. The framework also has the ability to support grammatical ambiguity through probabilistic rules and linguistic exceptions in order to increase the quality of the extracted information. (Contains 1 table, 8 figures and 2 footnotes.)
National Center for Research on Evaluation, Standards, and Student Testing (CRESST). 300 Charles E Young Drive N, GSE&IS Building 3rd Floor, Mailbox 951522, Los Angeles, CA 90095-1522. Tel: 310-206-1532; Fax: 310-825-3883; Web site: http://www.cresst.org
Publication Type: Reports - Descriptive
Education Level: N/A
Audience: N/A
Language: English
Sponsor: Bill and Melinda Gates Foundation
Authoring Institution: National Center for Research on Evaluation, Standards, and Student Testing