Font Size: a A A

Statistical Machine Learning Algorithm Study On Distributed Heterogeneous Data

Posted on:2007-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y L XiaFull Text:PDF
GTID:2178360212485372Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Rapid development in computer computing and communication heralds the massive emergence of pervasive distributed computing environments, such as the Internet, intranets and sensor networks. How to efficiently take advantage of the computing and storage terminates scattered in distributed computing environments becomes a hot topic in machine learning and data mining.Distributed machine learning includes the homogeneous learning and the heterogeneous learning. This thesis focuses on the heterogeneous learning in distributed computing environments.This thesis presents a covariance decomposition based principle component analysis method for distributed heterogeneous dataset, which enjoys higher efficiency and accuracy compared to the existing algorithm. The statistics estimating methods derived in this method can also benefit other multi-variance statistical machine learning algorithms.Two feature selection approaches are proposed in this thesis for supervised distributed learning and semi-supervised distributed learning respectively. These methods root in multiple random walk and Markov blanket identification techniques, by which the sample space can be reduced to an optimized subspace in the sense of maximum mutual information.As mixture model is widely employed by machine learning algorithms, this thesis presents an EM framework based parameter estimation method for distributed heterogeneous mixture model. Besides, for a special sort of the heterogeneous data, the meta-data, we suggested a Monte Carlo sampling based method for parameter estimating of mixture model.
Keywords/Search Tags:distributed machine learning, heterogeneous data analysis, Gaussian mixture model, meta-analysis
PDF Full Text Request
Related items