Research On Na(?)ve Bayesian Classification And Its Application

Posted on:2012-01-08

Degree:Master

Type:Thesis

Country:China

Candidate:J Duan

Full Text:PDF

GTID:2178330335455436

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Naive Bayes (NB) is one of the most important classification algorithms in data mining. It has many feathers such as simple, fast and stable compared to other methods. NB has a strong independence assumption between predictive attributes given the class value which would harm its performance as in many real applications the conditional independence assumption in NB is rarely true. Lots of existing works have attempted to alleviate the independence assumption so as to improve the performance of NB, and they can be divided into Structure Extension, Local Learning, Feature Selection and Feature Weighting.In this paper, both of Structure Extension and Feature Weighting are studied to improve the performance of NB. The main research works in the paper include the following aspects:1) Structure Extension:frequent itemsets in association rules are adopted to reflect the dependences among multiple attributes so that the strong attributes selection process is avoided. A double layer Bayesian structure is constructed. Bayesian classification algorithm based on Frequent Itemsets (FISC) has many shortcomings, such as the way of probabilities estimation is rough and the integration way of classifiers is too simple. Aiming at these problems, an improved Bayesian algorithm based on a new way of probabilities estimation (FISC-M) and another improved Bayesian algorithm based on a weighting integration strategy (WFISC) are proposed in the paper. Furthermore, a constraint for the length of itemset is made to solve the problem of high cost of time in FISC operation. The constraint can short the running time while ensure the classification accuracy of the algorithm. The experimental results show that both FISC-M and WFISC greatly outperform the original FISC, and also have better performances than several existing Bayesian classification algorithms.2) Feature Weighting:VPRS theory is applied to Bayesian classification and an attribute weighted Naive Bayesian classification algorithm based on VPRS (AWNB-VPRS) is proposed to improve the performance of NB. AWNB-VPRS determines the importance of an attribute by adopting the method of VPRS. It considers the weighted approximation of attribute as well as the information gain. The experimental results prove that Bayesian classification algorithm based on VPRS is more effective than the Bayesian classification algorithm based on rough set.3) Practical application:the proposed algorithms are applied to the research of analyzing and mining Traditional Chinese Medicine (TCM) clinical laws of diagnosis and treatment of Coronary Heart Disease (CHD) and the diagnosis model of TCM syndrome of CHD is implemented. This work verifies the effectiveness of the proposed algorithms further to some extent.

Keywords/Search Tags:

Classification Algorithm, Na(i|¨)ve Bayes, Frequent Itemsets, Variable Precision Rough Set, TCM Diagnosis and Treatment of Coronary Heart Disease

PDF Full Text Request

Related items

1	Research On Classification Algorithm Of Decision Tree Based On Variable Precision Rough Set
2	Association Rules In Data Mining System Based On Coronary Heart Disease Database Design And Realization
3	Text Classification Using Sentential Frequent Itemsets
4	Study On Coronary Heart Disease Classification By Rough Set And Decision Tree Algorithm
5	Research On The Method Of Controlling The Diversity In Ensemble Learning
6	The Study Of The Application Of Rough Set Theory In Identifying Chest Arthromyodynia (the Coronary Heart Disease) In Chinese Medicine
7	The Feature Extraction Of ST Segments Based On Wavelets Transform And Its Application In Coronary Heart Disease Diagnosis
8	Research And Application On Variable Precision Rough Set Classification Algorithm
9	Application Of Decision Tree Model In Whole Genome Association Of Coronary Heart Disease
10	Data Warehousing And Data Mining On The Clinical Database Of Coronary Treatment With Chinese Traditional Medicine