Font Size: a A A

Research Of Quantity Analysis Of Data Quality For Bayesian Classifier

Posted on:2009-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:C B JiFull Text:PDF
GTID:2178360242489951Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Classification is one of important technologies in the domain of Data Mining. And Bayesian Classifier, which is a successful implement of classification, is a algorithm which use knowledge of statics to classify the instance. Generally, the same classifier will have different performance on different data sets. The differences come from some inner cause of data set, which called quality of data. Bayesian classifier is chosen as the base of the data quality we researched, for its principle is clear and the high performance on time.After widely introduction and analysis of relevant theories, including classification, quality of data, genetic algorithm, this paper introduce the Weka, a DM experiment platform, parsed the structure and the usage of data filter of Weka, show function and principle of its implement of every children class of instance filter, and describe how to implement a data filter. Then, we put forward an instance selection algorithm, which use GA as search algorithm and RSCT as heuristic function. RSCT, which stand for Random Sample Classify Test, is the quantity criterion we put forward in this paper. We implement this algorithm in Weka, and use UCI data set to do some experiment. The process of the experiment and the result under different parameters is given detailed. Then we compare and analyze the result. It shows that, this resample method can greatly decrease the cost of computation under the condition which at least the accuracy is not lower than the original, and it can increase the accuracy for some data set. It is proved that the quality of data set can be measure quantity with this criterion, which can be used as heuristic function in data sample and preprocess of DM. And this has a markable meaning in reduction of data and the optimization of the classifier.
Keywords/Search Tags:Bayesian Classifier, Data Quality, Instances Selection, Data Mining
PDF Full Text Request
Related items