An Improved Method Of Monte Carlo Bayesian Classification

Posted on:2005-06-25

Degree:Master

Type:Thesis

Country:China

Candidate:X Qin

Full Text:PDF

GTID:2168360122491534

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

With the development of information technology and databases' wide use, more and more information is accumulated, and how to find out interesting knowledge from it is a serious problem of our society. Technolegy of knowledge discovery emerge as times require, and become one of the hot research projects. KDD (Knowledge discovery in databases) can find out the effective, novel, latent, and apprehensible information. Data mining is the key step of KDD, which concerns on database, artificial intelligence, and statistics, etc.Classification is the important content of data mining, which assigns dataitems in databases to a special class by constructing a classification function or model (also be called classifier). Therefore, we can predict the unlabelled object classes with the classification model. Unlike other classifications, Bayesian classification bases on mathematics and statistics, and its foundation is Bayesian theory, which answers the posterior probability. Theoretically speaking, it would be the best solution when its limitation is satisfied.Monte Carlo is a method that approximately solves mathematic or physical problems by statistical sampling theory. When comes to Bayesian classification, it firstly gets the conditional probability distribution of the unlabelled classes based on the known prior probability. Then, it uses some kind of sampler to get the stochastic data that satisfy the distribution as noted just before one by one. At last, it can obtain the posterior probability distibution of each unlabelled classes by analysing these stochastic data. It is easy to get a stochastic sample that satisfies some special distribution through running a special Markov chain, so MCMC (Markov Chain Monte Carlo) is the most common Monte Carlo Bayesian method.MCMC method can reduce the costs of time and space in data mining, but it is impracticable in massive datasets' computation. This thesis improves the MCMC method so that it can be adapted to massive datasets' data mining. Our proposed approach is to split the dataset sample into two parts and change the strategy ofscanning datasets into two loop, the inner loop and the outer loop. The scan of the dataset will become the outer loop and the scan of the draws from the posterior distribution. Furthermore, this thesis not only evaluates the sampling efficiency and the effective sample size, but also enhances the practical operation capability of massive datasets' dataming through particle filtering.

Keywords/Search Tags:

Data mining, KDD, Bayesian classification, Monte Carlo

PDF Full Text Request

Related items

1	Bayesian Inference for Complex and Large-Scale Engineering Systems
2	Bayesian Monte Carlo signal processing and its applications in communications
3	Research On Educational Administration Data Mining Based On Bayesian K-nearest Neighbour Algorithm And Principal Component Analysis
4	Research On Sequential Monte Carlo Methods For Nonlinear Filtering Techniques
5	Study On The New Methods Of High-resolution Direction Finding Based On Bayesian Principle And Monte Carlo Method
6	Bayesian Monte Carlo signal processing for wireless communication
7	Non-linear Filtering Based On Monte Carlo Method
8	Bayesian calibration for Monte Carlo localization
9	Research On Imbalanced Data Classification Based On Monte Carlo Neural Network Algorithm
10	Spatial applications of Markov chain Monte Carlo for Bayesian inference