Font Size: a A A

Research And Application Of Naive Bayesian Classification Based On Attribute Selection

Posted on:2017-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:T DuFull Text:PDF
GTID:2308330485951844Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bayesian classification method can well deal with various data types, along with the progress of traditional Chinese medicine (TCM) diagnosis in the standardization and quantification. Recently, more and more scholars adopt Bayesian classification method in the diagnosis of TCM. Naive Bayesian classification (NBC) algorithm as one of the classic Bayesian classification algorithms, it has the advantages of simple structure, computing efficiency. Nevertheless, its conditional independence assumption restricts its application range. In addition, due to data volume increment, it is often exist some redundant attributes in actual data sets and consequently the learning efficiency and classification performance of NBC is reduced. In order to adapt with the actual demand, this dissertation improves three aspects in NBC:attribute selection, attribute weighted and structure extension. Moreover, we apply the improved model to the problems of infertility diagnosis of TCM and verify its validity and accuracy through experiment.First, to improve the classification accuracy, this dissertation proposes a computing method of attribute weights based on KL distance and split information. Then, we use the weight as the attributes’weighting coefficient in hidden naive Bayesian model (HNB) and propose an improved weighted hidden naive Bayesian classification algorithm (WHNBC). The simulation results show that, compared with the state-of-the-art algorithms, the WHNBC algorithm improves the classification accuracy. Thus, these results demonstrate the correctness and validity of the proposed method.Second, in view of the problems of redundant or irrelevant attributes in actual applied data, this dissertation introduces the Pearson correlation coefficient and the concept of variance between attributes’relevance on the basis of correlation-based feature selector and proposes a new attribute selection algorithm called VCFSPabs. The experimental results show that the algorithm can effectively remove the redundant attributes and get a good attribute subset.Then, on the basis of the attribute subset and WHNBC algorithm, this dissertation presents an improved weighted hidden naive Bayesian classification model based on attribute selection (AS-WHNB). This model consists of three parts, namely, attribute selection part, attribute weight calculation part and classification training part. Among them, in the training part of the classification model, we further divide the attribute subset into the strong attribute subset and the weak attribute subset. Then, we use the WHNBC and NBC model to train the two subsets respectively. The experimental results show that, when the number of attributes is large, the AS-WHNB classification model not only improves the classification accuracy, but also effectively reduces the classification consumption time.Finally, we pre-process the infertility data set. Then, we apply NB, C4.5, TAN, AODE, WHNBC algorithm and AS-WHNB classification model to the problem of infertility diagnosis of TCM. The experimental results show that AS-WHNB classification model achieves a higher classification performance. This shows that the idea of the proposed classification model (AS-WHNB) is effective and valid for modeling infertility diagnosis of TCM.
Keywords/Search Tags:infertility diagnosis of TCM, classification algorithm, naive Bayesian, weighting coefficient, attribute selection, structure extension
PDF Full Text Request
Related items