Font Size: a A A

Some Statistical Inference Methods Of Graphical Models And Multiple Change-point Detection In High-dimensional Situation

Posted on:2022-07-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:J ZhouFull Text:PDF
GTID:1487306611455444Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the progress of data mining and storage technology,more and more data can be obtained.On the one hand,the surge of data scale makes the number of variables grow rapidly,and the relationship between various variables becomes more and more complex,forming a complex network structure.Exploring the underlying network structure hidden among many variables through graphical models can provide people with more useful information,which has important practical significance in many practical applications.On the other hand,in the face of large-scale data,it is unreasonable to assume that all data obey the same distribution or have the same structure.In practice,for a group of large-scale ordered observation data,the data structure often mutates at some unknown points,which are called change-points.This means that the data before and after the change-point do not follow the same model.In this case,accurately finding the location where the data structure changes,that is,change-points detection,is very important for the accuracy of data result analysis.Therefore,this dissertation carries out research based on the above two aspects.For the large-scale graph recovery,firstly,a scalable inference method for the highdimensional Gaussian graphical model is proposed in this dissertation.The method has strict theoretical guarantee,and we verify that the method is feasible through numerical simulation.Compared with the existing methods,our method has higher computational efficiency and can deal with higher dimensional Gaussian graphs.Secondly,in order to ensure that most of the identified edges are indeed true,we propose a high-dimensional Gaussian graphical knockoff filter to realize the graph recovery with the false discovery rate(FDR)control guarantee.The method constructs knockoffs and statistics for each node locally,and then solves a global optimization problem to determine a threshold for each node.The neighborhood of each node is estimated by comparing the statistics of each node with its threshold,so as to recover the graph.We prove that this method can achieve asymptotical false discovery rate(FDR)control for Gaussian graph recovery.A large number of simulations show that this method is indeed effective,and compared with the existing methods,it can enjoy higher power.Aiming at the problem of multiple change-point detection in a linear regression model,a two-stage multiple change-point detection method based on the change-point knockoff filter is proposed in this dissertation.The first step of the method is to cut the data sequence into several segments and transform the change-point detection problem into a variable selection problem.Different from the existing two-stage method,we use the idea of control variables to realize the selection of segments with the false discovery rate(FDR)control guarantee.Based on the selected segment,the specific locations of the change-points are found through the refining stage,so as to complete the identification process of the multiple change-points.Compared with the existing two-stage method,we allow the number of variables and change-points to diverge with the sample size,which makes our method have more extensive application,and we prove that our method can realize the asymptotical false discovery rate(FDR)control for segment selection,which provides a theoretical guarantee for the accuracy of change-point identification in the sense of interval.A large number of simulations confirm the superior performance of our method.
Keywords/Search Tags:Gaussian graphical models, Multiple change-point detection, Statistical inference, False discovery rate(FDR), High dimension, Linear regression
PDF Full Text Request
Related items