Font Size: a A A

The Research And Design Of Second-order TAN Model Based On Information Theory

Posted on:2017-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y WuFull Text:PDF
GTID:2308330482989822Subject:Computer technology
Abstract/Summary:PDF Full Text Request
From the shift of mainframe computers to client- server,cloud computing unveiled another new chapter in IT industry,big data, which is closely connected with cloud computing also drew the widely research, today with the data information explosion, the storage and analysis of mass data has gradually got the attention of people, how to maximize the use of big data this wealth to best serve the enterprise or an industry, turns to be a hot research in the era of big data. Take the medical industry as an example, using data mining technology to mass data, including a large number of patients and the patient’s symptoms, such as analysis and decision making. While the application of technology is mature, it not only improves the efficiency of information service in medical industry, at the same time improves the accuracy of the disease.With the development of data mining technology, it has made a great breakthrough in many ways, but the causal relationship between the variables in the domain, remains to be a big difficulty. The feature of Bayesian network is that it expresses well in the mutual information between the variables, by calculating the mutual information between attributes to find dependence, its biggest advantage is that it is able to display the relationship between variables in a graphical way. To this end, the construction of a good Bayesian network model as well as the relationship between the better expression variables has a very significant meaning.In the process of the study of Bias classifier, there are many kinds of models, such as Naive Bayesian(NB), the Tree Augment Na?ve Bayesian(TAN), Aggregating One-Dependence Estimators(AODE) model, etc. Which NB is the most simple restricted Bias classifier, based on NB, the researchers explored and expanded the Bias classifier model with more reasonable and more accuracy and better classification performance. The characteristics of NB and AODE model are lower in complexity, but the classification performance of AODE is better than NB model On the basis of the AODE classifier, it is extended to ANDE,although it has a higher classification performance, but the difference of the classification performance is higher than the growth of the complexity, so it has not received much attention in practical application. TAN classifier based on the conditional mutual information to carry on the chain extension, to realize the tree augment model, its classification performance is also better than NB.NB, TAN, AODE and so on are excellent in small data sets, but while the data is very large it reflects its weaknesses.Therefore, in the face of massive amounts of data, as far as possible to avoid the complexity of the high to achieve the Bias classifier in large data sets showed an excellent classification performance.The purpose of this paper is to improve the classification performance of the model based on the simple Bias classifier, and to describe another new Bias classifier model. Based on the NB model, it extended to be the AODE model, all the attributes in AODE, each attribute node has a chance to be used as parent node of all the other nodes, at last the average value is generated, so that the performance of classification can be improved by using the causal relationship between attributes. Its disadvantage is that every property is not weighted, so its classification performance is reduced. By the way to extend TAN with analysis of characteristics of TAN model. Although this method appears to be a model of the accumulation, in fact, the NB and AODE and other such models are extended to two orders, and then can provide accuracy, according to the idea of the classifier model, conducts the data set training and testing, check the results. On this basis, the shortcomings of the AODE model is removed, that is to carry out the attribute weights, once again to achieve the expansion of the TAN model, and check the final test data. This will not only enhance the Naive Bayesian classifier to the two levels, while removing the shortcomings of the AODE model, and effectively improve the classification accuracy of the classifier with innovation.The experimental results show that the two order TAN model based on attribute weights can be well used in large data sets and show good classification performance.
Keywords/Search Tags:Bayesian Networks, Attribute weighting, Conditional mutual information, Bayes Classifier, The two order tree augmented Bias classifier
PDF Full Text Request
Related items