Font Size: a A A

Research On Na(?)ve Bayesian Classification And Its Application

Posted on:2012-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:J DuanFull Text:PDF
GTID:2178330335455436Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Naive Bayes (NB) is one of the most important classification algorithms in data mining. It has many feathers such as simple, fast and stable compared to other methods. NB has a strong independence assumption between predictive attributes given the class value which would harm its performance as in many real applications the conditional independence assumption in NB is rarely true. Lots of existing works have attempted to alleviate the independence assumption so as to improve the performance of NB, and they can be divided into Structure Extension, Local Learning, Feature Selection and Feature Weighting.In this paper, both of Structure Extension and Feature Weighting are studied to improve the performance of NB. The main research works in the paper include the following aspects:1) Structure Extension:frequent itemsets in association rules are adopted to reflect the dependences among multiple attributes so that the strong attributes selection process is avoided. A double layer Bayesian structure is constructed. Bayesian classification algorithm based on Frequent Itemsets (FISC) has many shortcomings, such as the way of probabilities estimation is rough and the integration way of classifiers is too simple. Aiming at these problems, an improved Bayesian algorithm based on a new way of probabilities estimation (FISC-M) and another improved Bayesian algorithm based on a weighting integration strategy (WFISC) are proposed in the paper. Furthermore, a constraint for the length of itemset is made to solve the problem of high cost of time in FISC operation. The constraint can short the running time while ensure the classification accuracy of the algorithm. The experimental results show that both FISC-M and WFISC greatly outperform the original FISC, and also have better performances than several existing Bayesian classification algorithms.2) Feature Weighting:VPRS theory is applied to Bayesian classification and an attribute weighted Naive Bayesian classification algorithm based on VPRS (AWNB-VPRS) is proposed to improve the performance of NB. AWNB-VPRS determines the importance of an attribute by adopting the method of VPRS. It considers the weighted approximation of attribute as well as the information gain. The experimental results prove that Bayesian classification algorithm based on VPRS is more effective than the Bayesian classification algorithm based on rough set.3) Practical application:the proposed algorithms are applied to the research of analyzing and mining Traditional Chinese Medicine (TCM) clinical laws of diagnosis and treatment of Coronary Heart Disease (CHD) and the diagnosis model of TCM syndrome of CHD is implemented. This work verifies the effectiveness of the proposed algorithms further to some extent.
Keywords/Search Tags:Classification Algorithm, Na(i|¨)ve Bayes, Frequent Itemsets, Variable Precision Rough Set, TCM Diagnosis and Treatment of Coronary Heart Disease
PDF Full Text Request
Related items