Font Size: a A A

Tree Augmented Bayes Network Of Text Classification Method Research Based On Machine Learning

Posted on:2015-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:J M YangFull Text:PDF
GTID:2268330428982627Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the computer, people can get a large amout of data from the computer. To make the data classified effectly and make ueers get useful information from it is very important, which is key of improve efficiency of work and manage the data. For the complex chemical process which has some characteristics of large scale and complexity and multi-variable, when processing the data based on traditional methods, it is always to get the results from experience. But knowledge of people is limited, it can lead to the data classification results skewed. These limit the application of data classifying in the actual life.While the text classified algorithm based on machine learning which need neither to establish the complex model nor to require accurate prior knowledge, it can make full use of information technology and generate in the same time update algorithm automatically. Therefore the study on text classified algorithm based on machine learning not only has important theoretical significance, but also has broad application significance.This paper mainly study TANC (Tree Augmented Bayes Network) based on text classified algorithm, and make the deeply research about the model on the structure and application. The mainly research work include as follow aspects:1. TANC (Tree Augmented Naive Bayes Classifier) is efficient extend of NBC (Naive Bayes Classifier).This method inherits the simple and high efficiency performance, yet at the same time enhance the generalization ability which are characteristics of NBC. But it still fails to fully represent the correlation between the attributes. In order to make better the method of TANC, this paper presents a method to improve TANC according making use of dependence degree and the correlation between the attributes, with selecting the appropriate attributes which can set up a corresponding dependencies to effectively improve the classification accuracy. This classifier is compared with NBC and TANC by an experiment. According the experiment, we can see that the improved algorithm is better than that of TANC and NBC in performance.2. Tree Augmented Naive Bayes (TANC) is widely used because of its simplicity and it can be easily realized. But TANC is not very well to deal with continuous data and it ignores partial data in the absence of data attribute value and can reduce the result accuracy. To resolve this problem, an improved TAN classification based on C4.5is proposed in this paper. First, C4.5is used for data processing, then the TANC is set up, finally C4.5algorithm is used to TANC pruning, so as to delete redundant attributes in TANC. In this way it can improve the TANC and perfect the classification accuracy. The experimental results show that the improved algorithm is superior to C4.5and TANC in terms of classification accuracy.3. The traditional classification method gets low accuracy of the minority class samples for unbalanced data sets, however the results of a few classes tend to be particularly important. So an improved algorithm of TANC based on unbalanced data sets is proposed in this paper. This algorithm firstly uses relief algorithm to weight distribution for the minority class in the sample, at the same time we can improve TANC algorithm according to make sure the direction of the extended arc, then we can classify the data sets through the improved TANC algorithm. We do the experiment on UCI standard data sets, we can see that the overall performance of the proposed algorithm is superior to the TANC algorithm.
Keywords/Search Tags:Machine Learning, Classifier, NBC, TANC, C4.5algorithm, UnbalancedData Sets
PDF Full Text Request
Related items