Font Size: a A A

Research On Bayesian Classification Based On Continuous Attributes And Its Application

Posted on:2017-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:Q C ChangFull Text:PDF
GTID:2348330512468194Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the field of data mining,Naive Bayes classification algorithm get the attention of many scholars for its simple and efficient characteristics.But the conditional independence assumption of Naive Bayes is often difficult to get satisfied,it brings more or less influence to the classification performance of algorithm,therefore,letting the frequent itemset as the training set of Naive Bayes,it reduces the influence of the conditional independence assumption on the classification performance and improves the classification accuracy of the classifier.Among them,the main research work in this paper is as follows:(1)This paper analysises the existing continuous attribute discretization method detailedly,by discussing how to reduce the information loss in the process of discretization,and proposes a discretization algorithm(LFD)based on attribute whose frequency belonging to low frequency region.This method set the interval point in the lower frequency region of properties,so as to reduce the loss of data.(2)Through the research and analysis of the existing association rules mining algorithm,this paper combines the low frequency discretization,the weighted multi-minimum support with full confidence,and proposes a weighted multi-minimum support association rules mining algorithm(WM_SamplingHT)based on the low frequency discretization.This algorithm uses the low frequency discretization to discretize the continuous attributes firstly,and then sets different weight and minimun support for the itemset when mining the frequency itemsets,at the same time,this paper get rid of the false modes by full confidence,so as to get a cleaner frequent itemsets.(3)In view of the conditional independence assumption of the hidden Nai ve Bayes classifier can not get satisfied,and hidden Naive Bayes classifier can not handle the zero probability attributes as well,this paper propose a hidden Naive Bayes classfician algorithm based on the frequency itemsets(WL-HNB),which uses the frequency itemsets get in association rule as the training set,conbined with the improved Laplace estimate and the weighted operation,furthur reducing the influence of the Naive Bayes's conditional independence assumption.The contrast experiment with the traditional classification algorithm shows that in the majority of the data set,the classification performance of the algorithm is superior to the traditional classification algorithm.(4)The WM SamplingHT algorithm and WL-HNB algorithm are applied to the coronary heart disease TCM diagnosis assistance system,so as to verify the classification performance of this system.Compared with other classification algorithm of this system,the experiment shows that WM SamplingHT algorithm mines the frequency itemsets and the association rules in the system database successfully,while WL-HNB algorithm realizes the classification of the frequency itemsets.So then it plays a certain auxiliary function to the diagnosis and treatment of the coronary heart disease.
Keywords/Search Tags:Contimuous Attribute, Associated Information, Naive Bayes, Data Mining
PDF Full Text Request
Related items