Font Size: a A A

A Novel Multi-grouped Graph Bayesian Classi?cation Model

Posted on:2015-05-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H FengFull Text:PDF
GTID:1108330479979519Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, we have already entered the era of big data. Many convenient and various ways of data collection bene?t the risk management and open the door for an inviting opportunity to understand unknown things.However, it is offen di?cult to have a comprehensive understanding and landscope of what we are studying. In addition, many problems we encount are very quite complex indeed.Thus these data with high dimensions always have a lot of noise. There may be hundreds of thousands correlations between these complex dimensions. Worse, the noise of the any one dimension will be spread and ampli?ed, while these relationship modes between dimensions are often unstable. These facts cause that the data itself is a mixture of a lot of noise and heterogeneity. On the other hand, there are a lot of redundant or low correlation dimensions in the database, even some dimension itself is the noise generated in the process of data collection, which further double the di?culty for understanding laws of the problem. The more suitable classi?cation algorithms are the pressing requirements.In order to meet these requirements, this paper proposed a Multi-Grouped Graph Bayesian Framework(MGGBF) based on the full bayesian estimation. In this framework,the features are divided into four groups based on the characteristics of high-dimensional data. The ?rst two groups consisted of redundant and noise dimensions, which have nothing matching with classi?ed information, while the last two dimensions called the predictive features in which the features of ?rst group are independent and the features of second group are tree-related. The aim of this way to partition is to lower the calculation greatly while ensure the framework can completely cover all various data models. According to Bayes theorem, the paper deduced the properties of the MGGBF. These properties show the framework can automatically ?lter noise and redundant attributes while ?tting for regression or classi?cation without data preprocessing which is necessary in many classic methods.Secondly, we proposed a new algorithm for classi?cation called Multi-Grouped Graph Bayesian Classi?er(MGGB) by assuming the Multinomial and Dirichlet distribution for the MGGBF. Then its basic properties of the likelihood functions are deduced according with the original framework and its prediction process. Based these properties of MGGB,we demonstrate a series of theoretical model for structure learning and reasoning. For missing dataset, we also provided two strategies and present the corresponding theorems formula deformation.Thirdly, by using the theorems of the MGGB, the four atomic operations on evolution of single Grouped Graph are proposed to construct six basic sampling operations.Further, the changes on likelihoods of the single Grouped Graph by each sampling operation are derived as the foundation of establishing the e?cient sampling procedure. This procedure consisted of 11 steps by six sampling operations in a special order by which the generated Markov chains can be proved to be detailed balance and homogeneous. That is,the sampling procedure will coverage to the theoretical solution of the MGGBF under the assumption of the the Multinomial and Dirichlet distribution, so-called MGGB Classi?er.Finally, The MGGB is test by ?ve different simulation data and 11 real data of the UCI Machine Learning Repository. Among these experiments, the ?rst simulation data model has the same assumptions with MGGB. It shows that the MGGB can quickly converage within 50 steps sampling procedure. Similarly, the simulation data generated by networks fatures, hidden structures and heterogeneous model is used in experiments. All of these show that the MGGB can coverage in 80 steps at most. Further, any modes of the data are suitable for MGGB to classify. Compared to other 13 classical classi?cation algorithms, the MGGB has the highest CV-5 predicton accuracy almost. Especially, it has more distinct advantage than others when training data contains many mixed noise and redundant dimensions. For the 11 real data, the MGGB also shows excellent classi?cation and prediction performance, which is highest accuracy in seven data sets. In addition, we applied MGGB for two examples from applications of both civil and military. In these cases, the MGGB is used used as both pre-processing model to reduce dimension reduction and predictor to classify. Beside its high accuracy, the outputs of the unique grouped graphs facilitate to show the relationship between various dimensions for dicision maker to understand the rules of the data.
Keywords/Search Tags:Bayesian Estimation, Classification, Bayesian Regression Tree, Data Mining
PDF Full Text Request
Related items