Font Size: a A A

Learning Bayesian Network And Research On Data Classification

Posted on:2016-10-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Y LiFull Text:PDF
GTID:1108330488473862Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, how to explore and reveal the dependency relationship among variables becomes particularly important. Bayesian network is a kind of probabilistic graphical model with an extensive application. Bayesian network combines graph theory and probability theory, which provides a crucial theoretical basis and an effective tool for data classification, inference and prediction problems. Structure learning is one of the main research fields of Bayesian networks, however, it is an NP-hard problem.To solve this problem, a hybird algorithm is proposed to learn the skeleton of large Bayesian network. Then this paper focuses on studying Bayesian networks with latent variables and selection variables: define and learn the minimal essential graph and discuss the equivalent transformation problem. In view of Data classification, a Bayesian network classifier is constructed and the selection problem of parameter k value of k-nearest neighbor classifier is studied. The main contents are listed as follows:In order to learn Bayesian network structure from big data, a hybird algorithm is developed based on condition independence test. Considering the unreliability and high computational complexity of the high order condition independence tests and the aim to improve the efficiency of a dependency analysis algorithm, the key steps are to use less number of CI tests and reduce the sizes of condition sets as small as possible. In this paper, the candidate superset of parents and children of the target variable is tested by lower order tests, the candidate superset of spouses of the target variable is learned based on the result of the candidate superset of parents and children, and an available routine Inter-IAPC is adopted to shrink the candidate superset of parents and children. Finally, by performing extra computations, the large number of false negative nodes which are outputted by the last step are compensated.In the process of hierarchical selection target variables, the orders of the test sets are reduced ceaselessly, which effectively improves the accuracy of the large network structure learning.Consider the problem to represent a Markov equivalence class of ancestral graphs with a compact representation, a concept of minimal essential graph is defined. The graph space of ancestral graph is exponential increasing with the increasing of the number of variables, so discussing the common features of equivalence class and characterizing equivalence class is very important, which contributes to deeply understanding the essence of ancestral graphs,greatly reduces the graph space and makes meaningful guidance for structure learning. An algorithm is proposed to learn the minimal essential graph based on the detection of min-imal collider paths. And it can be used to fast determine whether two graphs are Markov equivalence. Furthermore, a set of orientation rules is presented to orient edge marks of a minimal essential graph.Study the problem of transformational characterization for Markov equivalence directed maximal ancestral graphs. For a given maximal ancestral graph, the equivalent transformation allows us to traverse its Markov equivalence class by the following operations: reverse an non-compelled edge, add or delete an arrowhead for an non-compelled edge. This paper concentrates on the transformation of bi-directed edges to directed edges. By far, combined with existing results, any two directed maximal ancestral graphs can equivalently transform to each other by changing bi-directed edges to directed edges, or changing directed edges to bi-directed edges or reversing directed edges. These results are important for deriving properties shared by Markov equivalent ancestral graphs and for discriminating the essential edges, especially.In view of Data classification, a Bayesian network classifiers(ARE-BNC) is constructed, and the k value selection problem of the k-nearest neighbor classifier is studied. ARE-BNC is based on rough set theory and evolutionary algorithm. Compared with other 8 kinds of stateof-the-art algorithms, the results show that ARE-BNC is reasonable and has higher accuracy.The parameter k is the only parameter of the k-nearest neighbor classifier and closely affects the effect of the algorithm. Therefore, selecting a k with the highest classification accuracy is crucial for k nearest neighbor rule. This paper employs the leave one out method which is almost unbiased to assess the performance of the k-nearest neighbor classifier. We take advantage of the concave relationship between the classification accuracy of leave one out method and k-value to establish a rapidly search strategy to seek the optimal value of k.This search strategy can quickly converge to the optimal k value with the best classification accuracy.
Keywords/Search Tags:Bayesian Network, Ancestral Graph, Learning Structure, Latent Variable, Markov Equivalence, k-nearest neighbor classifier
PDF Full Text Request
Related items