NotesFAQContact Us
Search Tips
Back to results
ERIC Number: ED578787
Record Type: Non-Journal
Publication Date: 2017
Pages: 99
Abstractor: As Provided
ISBN: 978-0-3552-1516-8
Development and Evaluation of Thesauri-Based Bibliographic Biomedical Search Engine
Alghoson, Abdullah
ProQuest LLC, Ph.D. Dissertation, The Claremont Graduate University
Due to the large volume and exponential growth of biomedical documents (e.g., books, journal articles), it has become increasingly challenging for biomedical search engines to retrieve relevant documents based on users' search queries. Part of the challenge is the matching mechanism of free-text indexing that performs matching based on orthographic forms (i.e. matching homographs instead of related terms). This dissertation intended to improve the search relevancy of existing text-based biomedical document retrieval approaches by designing thesauri-based bibliographic biomedical document retrieval system. The proposed system integrates four novel artifacts. First, this project enhanced the best mapping technique using an embodied VSM to map noun phrases from free text in documents and queries to canonical concepts from the Unified Medical Language System (UMLS) Metathesaurus. Second, it utilizes an improved method, Sense-based Vector Space Model (S-VSM) to create representations for documents and queries. In addition to term-based indexing units, the S-VSM uses concepts unique identifiers (CUIs) for recognized terms in the UMLS Metathesaurus as indexing units. Third, it is the first attempt to index the document collection by integrating two biomedical domain specific ontologies, the UMLS Metathesaurus and the Medical Subject Headings (MeSH) thesaurus, when using controlled vocabulary single-word indexing technique. Fourth, it uses an improved method of processing indexed units by performing word simplifying instead of word stemming to avoid changing the meaning of biomedical concepts. Furthermore, a proof-of-concept system was built using OHSUMED test collection database. The system's performance was measured using the standard 11-point interpolated average precision graph and the Mean Average Precision (MAP) across the OHSUMED dataset. Each of the four artifacts is evaluated individually of its impact on the retrieval performance against traditional approaches. Additionally, the overall performance of the retrieval system was compared with published results of existing thesaurus-based retrieval systems. The results confirmed that the S-VSM using the BCM outperformed the traditional VSM and the best matching technique. Also, this research project proved that the word simplifying indexing technique outperformed the word stemming indexing technique. However, integrating the UMLS Metathesaurus with the MeSH thesaurus in single-word controlled vocabulary indexing had no significant advantage over using just the UMLS Metathesaurus even though that the MeSH thesaurus has some vocabularies that do not exist in the UMLS Metathesaurus. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A