Font Size: a A A

Bayes Classifiers Research Based On The Incomplete Data

Posted on:2008-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z F QiaoFull Text:PDF
GTID:2178360212968144Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of Database, data warehouse and Internet technique, data mining and knowledge discovery have attracted much attention of many researchers and experts, and they have developed rapidly. Classification is one of the data mining important research topic, its target is to find out the classification function or classification model. Bayesian Network, as an effect way for knowledge representation and probability reason model, is a powerful decision analysis tool dealing with graph of uncertain information.In this paper, we first introduce data mining and the main method of classification in data mining, and analyze the current the definition and operation of the method, especially the Bayesian technique. Bayesian Network G = (Bs,Bp) is a DAG with noted probability table, consisting of two parts-network topological structure Bs and partial probability distribution Bp. It bases on Bayes'theorem, maximum a posterior hypothesis and Bayesian networks. Bayesian Network used in classification is called Bayesian Classifier, which is a special form of Bayesian Network for that both variable-choosing and state number have been decided with attribute nodes given and class node unknown. The learning for Bayesian classifier includes structure learning and parameter learning and inference class node of MAP.Current classifiers can work effectively is based on the precondition, that the dataset of training and test is complete, or seldom feature values is incomplete. In fact, most real-life databases contain missing data because of many reasons, and a great deal of information we can get is often incomplete and missing. The missing data may be correlation with the values of some attributes in the network, now the missing data involves some useful information. Most classifiers handle incomplete data as a lone value, this will influence to the accuracy of classifier. Therefore estimate the value of incomplete data or the value trend of incomplete data is very important. In fact must carry on to the imperfection data while resolve actual problem right, valid incomplete data's processing.Bayesian networks can calculate the numerical inference based on the prior knowledge and the observed data. So Bayesian method is powerful tool dealing with missing data. The main work in the paper is listed as follow:(1) Generalizing and summarizing the theorem of Bayesian networks, analyzing the...
Keywords/Search Tags:Classification, Na?ve Bayesian Classifier, TAN Classifier, Missing Data, Maximum Weighted Spanning
PDF Full Text Request
Related items