Font Size: a A A

An analysis of document category prediction responses to classifier model parameter treatment permutations within the software design patterns subject domain

Posted on:2010-10-11Degree:D.C.SType:Dissertation
University:Colorado Technical UniversityCandidate:Pankau, Brian LFull Text:PDF
GTID:1448390002973580Subject:Library science
Abstract/Summary:
This empirical study evaluates the document category prediction effectiveness of Naive Bayes (NB) and K-Nearest Neighbor (KNN) classifier treatments built from different feature selection and machine learning settings and trained and tested against textual corpora of 2300 Gang-Of-Four (GOF) design pattern documents.;Analysis of the experiment's trials, powered by a framework based on WordStat 5.1 with QDA Miner 1.1 by Provalis Research, shows that there is a statistically significant correlation between category prediction success and classifier construction settings when assessed at the 5% significance level using the Friedman test. The best classifier was found to have a prediction success rate of just under 65 percent.;Results demonstrate that classifiers should be built using the feature selection Chi-square statistic and the basis for dictionary keywords selection should be occurrence. To minimize Type 1 errors, classifiers should use the KNN machine learning algorithm and trained using percentage of keywords weighted using inverse document frequency. To minimize Type II errors, the NB algorithm should be employed using keyword frequency with no weighting.
Keywords/Search Tags:Category prediction, Document, Classifier, Using
Related items