Font Size: a A A

Research On Bayesian Classification Based On Association Information

Posted on:2016-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:T H SunFull Text:PDF
GTID:2308330461477085Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the network popularization and the rapid development of database technology, the amount of information have explosive growth. There are countless valuable information in large of data, How to excavate the use of this information be the hot spot in the field of data mining. Bayesian classification algorithm has become a hot one of them, because it’s simple and efficient.Bayesian classification algorithm is a kind of method that the target data can be predicted by prior probability of the class, Naive bayes classification algorithm is the most widely used and high efficiency of the bayesian classification algorithm, But its biggest weakness is assumed independence of attribute, However,in the real world, attribute is not independent. In this paper, the frequent itemset is applied to the naive bayesian classification algorithm, the independence assumption become weak, it make more accurate classification. For specific research work:(1) Associated information:In this paper, Association rule model and frequent itemsets combined with naive bayes algorithm was improved by the generation of candidate itemsets and attribute relationship. For specific research work:based on the hash technology from improved algorithm (SamplingHT), a Hash table and use the technology to improve the algorithm and get SamplingHT algorithm, through a lot of contrast experiments showed that the new algorithm enhances performance when frequent itemset is generated, and effectively reduce the database scan times, In order to achieve more optima.(2) Classified information:This paper proposes a new Bayesian classification method by frequent itemset(WM-FISC), FISC is the classic methods which frequent itemsets combined with naive bayes algorithm, training set is composed of frequent itemsets which be generated from SamplingHT algorithm, in order to the independence assumption become weak, improved FISC algorithm by M-estimate and a weighting integration strategy, further to improve the naive bayesian classification algorithm weakness. Through a lot of contrast experiments showed that the new algorithm is better than FISC algorithm,and better than some other Bayesian classification algorithm.(3) Actual applications:The proposed SamplingHT algorithm and WM-FISC algorithm in this paper are used in modules of Clinic Heart Disease auxiliary systems, it successfully find to conceal of association rules in TCM diagnosis database. In the process of the diagnosis of coronary heart disease plays an effective auxiliary function.
Keywords/Search Tags:Data Mining, Naive Bayes, Associated Information, Hash Function, M-estimation
PDF Full Text Request
Related items