An analysis of document category prediction responses to classifier model parameter treatment permutations within the software design patterns subject domain

Posted on:2010-10-11

Degree:D.C.S

Type:Dissertation

University:Colorado Technical University

Candidate:Pankau, Brian L

Full Text:PDF

GTID:1448390002973580

Subject:Library science

Abstract/Summary:

This empirical study evaluates the document category prediction effectiveness of Naive Bayes (NB) and K-Nearest Neighbor (KNN) classifier treatments built from different feature selection and machine learning settings and trained and tested against textual corpora of 2300 Gang-Of-Four (GOF) design pattern documents.;Analysis of the experiment's trials, powered by a framework based on WordStat 5.1 with QDA Miner 1.1 by Provalis Research, shows that there is a statistically significant correlation between category prediction success and classifier construction settings when assessed at the 5% significance level using the Friedman test. The best classifier was found to have a prediction success rate of just under 65 percent.;Results demonstrate that classifiers should be built using the feature selection Chi-square statistic and the basis for dictionary keywords selection should be occurrence. To minimize Type 1 errors, classifiers should use the KNN machine learning algorithm and trained using percentage of keywords weighted using inverse document frequency. To minimize Type II errors, the NB algorithm should be employed using keyword frequency with no weighting.

Keywords/Search Tags:

Category prediction, Document, Classifier, Using

Related items

1	Vehicle Category Prediction Based On Historical Trajectory
2	The Design And Impementation Of Tianjin Soda Plant Document Management System
3	Location Category Prediction Based On Embedding Learning
4	Hybrid-Attention Enhanced Two-Stream Fusion Network For Video Venue Category Prediction
5	Category Prediction For Commodity Image On E-commerce Platform
6	Bayesian Classifier And Web Document Classification
7	Research On Mapping Mechnism Of Learning Expression
8	The Research And Implementation Of Multi-Lingual And Multi-Category Text Classification System
9	A Chinese Word Level Segmentation Algorithm Based On Document Category
10	Research On Degraded Document Image Binarization