NotesFAQContact Us
Search Tips
ERIC Number: ED530554
Record Type: Non-Journal
Publication Date: 2011
Pages: 120
Abstractor: As Provided
Reference Count: 0
ISBN: ISBN-978-1-1247-6562-4
Cluster-Based Query Expansion Using Language Modeling for Biomedical Literature Retrieval
Xu, Xuheng
ProQuest LLC, Ph.D. Dissertation, Drexel University
The tremendously huge volume of biomedical literature, scientists' specific information needs, long terms of multiples words, and fundamental problems of synonym and polysemy have been challenging issues facing the biomedical information retrieval community researchers. Search engines have significantly improved the efficiency and effectiveness of biomedical literature searching. The search engines, however, are known to return many results that are irrelevant to the intention of a user's query, in other words, perform not very sound in terms of precision and recall. To further improve precision and recall of biomedical informational retrieval, various query expansion strategies are widely used. In this thesis, we concentrate on empirical comparison, experiments and evaluations in investigating query expansion methods. We also use the findings as an empirical justification for cluster-based query expansion. We have investigated broadly many methods of query expansion such as local analysis, global analysis, ontology-based term reweighting across various search engines and obtained important insights. Among the findings, two-stage concept-based latent semantic analysis strategy and cluster-based query expansion have been presented and the Singular Value Decomposition (SVD) technique in the Latent Semantic Indexing (LSI) is utilized in the proposed method. In contrast to other query expansion methods, our strategy selects those terms that are most similar to the concepts of in the query as well as the related documents, rather than selects terms that are similar to the query terms only. Furthermore, we propose a novel framework for cluster-based query expansion. we have designed and implemented a novel and efficient computational approach to cluster-based query expansion using language modeling. Through our experiments in TREC genomic track ad-hoc retrieval task, we demonstrate that clusters which are created based on the whole collection or the initially returned document results of the original query can be utilized to perform query expansion and eventually improve the overall effectiveness and performance of information retrieval system in the biomedical literature retrieval. Lastly, we believe the principles of this strategy may be extended and utilized in other domains. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: Higher Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A