NotesFAQContact Us
Search Tips
ERIC Number: ED525994
Record Type: Non-Journal
Publication Date: 2009
Pages: 316
Abstractor: As Provided
Reference Count: 0
ISBN: ISBN-978-1-1244-9820-1
An Analysis of Document Category Prediction Responses to Classifier Model Parameter Treatment Permutations within the Software Design Patterns Subject Domain
Pankau, Brian L.
ProQuest LLC, D.C.S. Dissertation, Colorado Technical University
This empirical study evaluates the document category prediction effectiveness of Naive Bayes (NB) and K-Nearest Neighbor (KNN) classifier treatments built from different feature selection and machine learning settings and trained and tested against textual corpora of 2300 Gang-Of-Four (GOF) design pattern documents. Analysis of the experiment's trials, powered by a framework based on "WordStat" 5.1 with "QDA Miner" 1.1 by Provalis Research, shows that there is a statistically significant correlation between category prediction success and classifier construction settings when assessed at the 5% significance level using the Friedman test. The best classifier was found to have a prediction success rate of just under 65 percent. Results demonstrate that classifiers should be built using the feature selection Chi-square statistic and the basis for dictionary keywords selection should be occurrence. To minimize Type 1 errors, classifiers should use the KNN machine learning algorithm and trained using percentage of keywords weighted using inverse document frequency. To minimize Type II errors, the NB algorithm should be employed using keyword frequency with no weighting. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A