NotesFAQContact Us
Search Tips
ERIC Number: ED565914
Record Type: Non-Journal
Publication Date: 2013
Pages: 151
Abstractor: As Provided
Reference Count: N/A
ISBN: 978-1-3036-9300-7
Data Mining and Domain Knowledge: An Exploration of Methods to Advance Medical Research
Engle, Kelley M.
ProQuest LLC, Ph.D. Dissertation, University of Maryland, Baltimore County
Researchers in the medical domain consider the double-blind placebo controlled clinical trial the gold standard. The data for these clinical trials are collected for a specifically defined hypothesis and there is very little in the realm of secondary data analyses conducted. The underlying purpose of this work is to demonstrate the value and relevance of data mining and artificial intelligence methods for both pre-processing needs and secondary data analyses in medical research. The selected medical domain for this demonstration is autism and in particular the data from IAN (Interactive Autism Network) obtained from Kennedy Krieger. During the process of predictive model building, numerous research issues were addressed at different phases. Solutions were provided for: (1) Statistical issues with metric-based data mining methods and (2) Provide guidelines for how to incorporate domain knowledge in data mining. Various statistical methods used in data mining, such as Naive Bayes, require metric data to ensure reliable and robust results. Many public data health sources, including the IAN dataset, primarily consist of non-metric data in the form of Likert scales and categorical data. "MDS" ("Multi-Dimensional Scaling") will be presented as method which can effectively transform non-metric data to metric. For incorporating domain knowledge in data mining, the initial work of integrating autism domain knowledge in "multi-level association rule mining" is presented. Through the use of an external treatment ontology, more interesting association rules were extracted for autism treatments. In order to further explore the role of "knowledge guidance", the hypothesis indicated that "knowledge-guided mutation" applied to classification rules will affect the search trajectory incrementally. This hypothesis builds on the underlying premise of "gradualness". A pilot and full-fledged experiments were conducted where knowledge from the autism domain in the form of a drug taxonomy and autism comorbidity semantic net guided the mutation of classification rules. The experiments for the drug taxonomy confirmed the hypothesis that domain knowledge can be utilized to constrain the search space. This research is both novel and significant as it provides a practical resource for health informatics researchers who want to incorporate domain knowledge into data mining and artificial intelligence models. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A