Font Size: a A A

Statistical Methodologies And Applications Of Graphical Models And Dictionary Models

Posted on:2009-03-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:K DengFull Text:PDF
GTID:1100360245957212Subject:Probability and Statistics
Abstract/Summary:PDF Full Text Request
In modern sciences,it's a very important research strategy to learn the systems concerned by cumulating and analyzing the observational or experimental data.Especially, in the situations where the knowledge about the systems is very limited,data mining becomes the only hope to detect the potential rules and patterns of the target systems, which can help us to analyze,predict or even control the behaviors of the systems.In this article,we'll show some interesting results we obtained in this area.Firstly,we develop some new methods on the basis of graphical models,which has been widely used in data mining and multivariate analysis.Then,we focus on a new type of data mining problems,where the potential patterns in the system may be heavily overlapping.A powerful method based on the generalized dictionary model is proposed to solve this type of problems.As important tools for data mining and multivariate analysis,graphical models received many attentions in recent years,and has been widely used in many areas.So far, there are two main directions in the research of graphical models:learning the unknown structure of a graphical model and the statistical inference with known structure.Although people have studied this field for a long time and developed many methods,there are still a few important problems unsolved.For example,the structural learning with small sample,and the statistical inference on very large graphs.In this article,we discuss the two types of problems respectively.For the first type,we develop a novel heuristic method based on mutual information to learn graph structures.By finding the neighbors of each node one by one separately,the new method can recover the graph structure efficiently even when the sample size is quite small.For the second type,starting from delay tomography,a realistic problem in computer communication,we obtain a very interesting statistical inference problem on graphical models:large-scale deconvolution in trees. Prom the statistical point of view,this is a typical missing data problem on a special graph structure.To solve this problem,we introduce a recursive partial imputation strategy to impute the missing values,thus decrease the calculation involved in the EM algorithm greatly.We call the EM algorithm with this new imputation strategy as sequential imputation EM(SIEM).This method gives new ideas for missing data analysis,and may also impact the research on decomposition of large graphs. Graphical models can describe the relationships among variables clearly and intuitively. However.once the potential patterns in the system are heavily overlapping.there are no efficient ways to describe and detect them within the framework of graphical models or other data mining methods.To solve this kind of problems.we create a new method based on generalized dictionary model.In the new method.we organize the potential patterns in the system into a dictionary,and build a probabilistic model to describe the behaviors of the dictionary.In real calculations,we use EM algorithm to estimate parameters; after that.we employ model selection techniques to detect new patterns,thus update the model structure.The new method is very sensitive to small weak patterns, can provide very accurate results,and has many very good properties.From the theoretical point of view,this method is closely related to graphical models,bi-clusters and independent component analysis,the idea of the new method may also help these related methods.In practise,the new method can be widely used in many very important areas, such as traditional Chinese medicines,sociology,biology,text mining and web search etc.
Keywords/Search Tags:graphical models, structural learning of undirected graph, delay tomography, imputation methodology, EM algorithm, decomposing graphical models, dictionary model, pattern identification, text mining, traditional Chinese medicines
PDF Full Text Request
Related items