Research On Bayesian Classification Based On Association Information

Posted on:2016-11-14

Degree:Master

Type:Thesis

Country:China

Candidate:T H Sun

Full Text:PDF

GTID:2308330461477085

Subject:Computer Science and Technology

Abstract/Summary:

Along with the network popularization and the rapid development of database technology, the amount of information have explosive growth. There are countless valuable information in large of data, How to excavate the use of this information be the hot spot in the field of data mining. Bayesian classification algorithm has become a hot one of them, because itâ€™s simple and efficient.Bayesian classification algorithm is a kind of method that the target data can be predicted by prior probability of the class, Naive bayes classification algorithm is the most widely used and high efficiency of the bayesian classification algorithm, But its biggest weakness is assumed independence of attribute, However,in the real world, attribute is not independent. In this paper, the frequent itemset is applied to the naive bayesian classification algorithm, the independence assumption become weak, it make more accurate classification. For specific research work:(1) Associated information:In this paper, Association rule model and frequent itemsets combined with naive bayes algorithm was improved by the generation of candidate itemsets and attribute relationship. For specific research work:based on the hash technology from improved algorithm (SamplingHT), a Hash table and use the technology to improve the algorithm and get SamplingHT algorithm, through a lot of contrast experiments showed that the new algorithm enhances performance when frequent itemset is generated, and effectively reduce the database scan times, In order to achieve more optima.(2) Classified information:This paper proposes a new Bayesian classification method by frequent itemset(WM-FISC), FISC is the classic methods which frequent itemsets combined with naive bayes algorithm, training set is composed of frequent itemsets which be generated from SamplingHT algorithm, in order to the independence assumption become weak, improved FISC algorithm by M-estimate and a weighting integration strategy, further to improve the naive bayesian classification algorithm weakness. Through a lot of contrast experiments showed that the new algorithm is better than FISC algorithm,and better than some other Bayesian classification algorithm.(3) Actual applications:The proposed SamplingHT algorithm and WM-FISC algorithm in this paper are used in modules of Clinic Heart Disease auxiliary systems, it successfully find to conceal of association rules in TCM diagnosis database. In the process of the diagnosis of coronary heart disease plays an effective auxiliary function.

Keywords/Search Tags:

Data Mining, Naive Bayes, Associated Information, Hash Function, M-estimation

Related items

1	Data Mining Systems And Their Applications - Improve The Performance Of The Naive Bayes Text Classifier, Associated Characteristics
2	Research And Application On Naive Bayes Classification Algorithm
3	The Mobile Customers Occupational Recognition Naive Bayes Algorithm-based Integration And Debugging
4	Research On Naive Bayes Classifiers And Its Improved Algorithms
5	The Research Of Multi-layer Hidden Naive Bayes Algorithm Based On Mutual Information
6	Research On Naive Bayes Classifiers And Its Improved Algorithms
7	Research And Improvement Of Attribute Weighted Naive Bayes Classification Algorithm
8	Kernel Density Estimation On Correlated Naive Bayes Network Traffic Classification
9	Design And Implementation Of Commercial Bank Customer Relationship Management System Based On Decision Tree Algorithm
10	Research On Bayesian Classification Based On Continuous Attributes And Its Application