Font Size: a A A

Research On Naivebayesian Classification Algorithm Based On Rough Set Theory

Posted on:2013-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y E HuFull Text:PDF
GTID:2248330371474347Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data mining is the result of the natural evolution of information technology, and is acomplex process of extracting or "mining" hidden and potential value knowledge from thelarge amounts of data. Among data mining technologies, the data classification is animportant research field. Bayes classification method is a kind of reasoning method that has asolid mathematics theoretical foundation and has a ability of integrating prior information anddata sample information. Especially its simple form Na ve Bayesian method has theadvantages of simple and effedtive and has been widely studied and applied.This paperanalyses the classification principle and advantages and disadvantages of the Na ve Bayesianclassification algorithm,and research the Na ve Bayesian classification model from twoaspects.Firstly,this paper emphatically research through attribute selection to relax conditionsindependence limitation of the model, and then based on this integrate ensemble learningtechnology to improve the mode. This paper mainly research works are as follows:1. This paper analyzes two defects existing in CEBARKNC attribute reduction algorithmproposed by Wang Guoyin and others, and proposes an improved attribute reductionalgorithm ASBCE based on condition entropy. This algorithm introduces the cosine metric ofassociation rules to identify samples that are not consistent, and according to the mind that ifone attribute is a strong correlation one, then there is a very strong correlation between it andothers in a property degree to delete the redundant attributes.Experiments show that thisalgorithm can get a kind of like independent attributes subset recently,and relax the conditionindependence assumption of Na ve Bayesian.2. Based on the Bayesian theory and the condition independence assumption, Na veBayesian classification model has advantages of simple structure and computeefficiency,etc.However,the reality data general has difficult to meet condition independenceassumption this is the limitation of the Na ve Bayesian model.In order to break this limitationto improve its classification effect,through the attribute selection to select an approximateindependent attributes subset is a kind of effective improvement method.The key research ofthis paper is to find a minimum redundancy and maximum related attributes subset through attribute selection.Based on the ASBCE attribute selection algorithm,this paper proposes aselective Na ve Bayesian classification model RSSNBC based on rouge set. The experimentalresults show that, compared with the classic Na ve Bayesian classification model, RSSNBCmodel has better classification accuracy.3. In order to further improve the performance of the above-mentioned singleclassification, introduce classifier ensemble technology to combine more than one classifiersthrough specific combination method, and finally into a combination classifier.NaiveBayesian classification model is a simple and efficient probability statistical classificationmethod, simple and accurate classification method is very suitable to serve as the baseclassifier of ensemble learning. According to the Na ve Bayesian classification model is astable model,so embed the attribute selection to enhance diversity between the classifiers inthe use of Bagging ensemble algorithm,and to improve the generalization of individualclassifier. Based on ASBCE attribute selection algorithm, this paper proposes a selectiveNa ve Bayesian combination classification model SNBCE. Through combining the ensemblelearning and attribute selection, experiments shows that, this algorithm can more effectivelyimprove its classification effect.
Keywords/Search Tags:data mining, Na(i|¨)ve Bayesian classifier, rough set, attribute selection, ensemble learning, Bagging
PDF Full Text Request
Related items