Font Size: a A A

High-dimensional Graphical Learning And Application Based On Sparse Data

Posted on:2018-08-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:X J XuFull Text:PDF
GTID:1360330596964387Subject:Software engineering
Abstract/Summary:PDF Full Text Request
For applications in high-dimensional space,the limitation and non-Gaussian distribution of data samples are the bottleneck of data analysis.It leads to the difficulty of mining the hidden valuable information.The principle of graphical models is to use the graph to represent joint probability distribution of variables by combing the probability and graph theories.It has a broad application prospect in data utilization.In order to exploit the high-dimensional problem driven by sparse data,we propose the use of hierarchical clustering based on dynamic time warping(DTW)and DTWD-BDMCMC graphical modeling based on DTWD-D measure,which is aimed to discover the hidden but valuable information.Furthermore,Copula DTWD-BDMCMC,an extension of DTWD-BDMCMC,is proposed for the analysis of non-Gaussian data.In particular,our work and contributions contain the following four parts.(1)Here we first present the structure learning methods of graphical models,which mainly include frequentist graphical models and Bayesian graphical models.For the frequentist models,the graphical lasso(glasso)and the Meinshausen-Buhlmann graph estimation(mb)are the usual methods to construct the structure of the graphical models.For regularization parameter selection in frequentist methods,we introduce an approach of StARS,which is a stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs.A convenient Bayesian strategy focuses on Gaussian graphical model determination by trans-dimensional Markov Chain Monte Carlo(MCMC)methods,which are designed for sampling over the joint framework of graph structure and prevision matrix,including reversible-jump MCMC(RJMCMC)and birth-death MCMC(BDMCMC).Compared with other frequentist models and Bayesian models,BDMCMC verifies its efficiency of learning the true graph structure.(2)A novel graphical algorithm,DTWD-BDMCMC,is proposed for modeling the intrinsic correlations buried in data.The principle behind it is to imbed DTW-D measure into Bayesian structure learning in sparse graphical models for calibrating the warping observation sequences.In detail,a modified DTW-D distance matrix is first developed to construct a weighted covariance instead of the traditional covariance calculated with Euclidean distance.We then build on Bayesian Gaussian models with the weighted covariance in an aim to be robust against problems of sequence distortion.Moreover,the weighted covariance is used as limited prior information to facilitate an initial graph structure,on which we finally employ BDMCMC for the reconstructed Gaussian graphical model determination.This initialization is beneficial to improve the convergence of BDMCMC sampling.To explore the performance of our algorithm,we compare the proposed algorithm with two rivals,BDMCMC and RJMCMC approaches,for the estimate of the graph structure.From the experimental results,it can be seen that the proposed algorithm,DTWD-MCMC outperforms two alternatives.(3)Copula DTWD-BDMCMC is further proposed by combing DTWD-BDMCMC with Copula model,which is aimed for the analysis of non-Gaussian data.This method mainly contains three steps as follows.The variables are first marginally transformed by using constrained monotone functions and the transformed data follows the Gaussian distribution.The Gaussian graphical model is then built based on the transformed data.Finally,the proposed DTWD-BDMCMC is employed to estimate the graph structure and parameters.For the network attack data,the appropriate use of our method exhibits two cliques and four hub departments.It also provides evidence that network attacks probably occur to some departments in an organized way.We also apply Copula DTWD-BDMCMC for the study of gene expression data.To evaluate the performance of our approach,we,in comparison,run both of our DTWD-BDMCMC and BDMCMC algorithms for the transformed data.It can be concluded that the use of DTWD-BDMCMC probably reduces false positive edges and successively discloses the real associations among genes.(4)Hierarchical clustering based on dynamic time warping distance measure was developed to classify temporal network security data,which is aimed to learn the pattern of temporal network attack in an unsupervised way.Dynamic time warping is capability of comparing time series by stretching or compressing them locally in order to make one resemble the other as much as possible.This work is helpful to identify the patterns of network attack of different departments.
Keywords/Search Tags:DTWD-BDMCMC graphical model, Graphical models, Gaussian graphical models, non-Gaussian graphical models, Bayesian graphical models, DTW measure
PDF Full Text Request
Related items