Font Size: a A A

Probabilistic Reasoning And Statistical Learning Based On Geometrical Property Of Data

Posted on:2009-05-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:K J WangFull Text:PDF
GTID:1118360245968511Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Probabilistic reasoning and statistical learning are important tools to explore inner relations among objects. Systematic researches have been made on some key technologies of probabilistic reasoning and statistical learning, focusing on two aspects: describing geometrical properties of data by geometrical methods, and probabilistic reasoning and statistical learning methods. The feature of the research work is the combination of the two aspects. The research items include: the linear and support vector regressions based on mining geometrical correlations between data, adaptive learning of dynamic Bayesian networks (DBN) with changing structures based on detecting geometric structures of time series, dynamic Bayesian networks based on correlations between geometrical patterns, and estimation of the number of clusters (NC) based on a two-cluster geometrical model. The main contributions of this thesis are outlined as follows:1. It is usually neglected that the mining and using of correlations between data of single variable in linear regression (LR) and support vector regression (SVR) methods. The geometrical correlation learning method (GcLearn) is proposed to enhance prediction ability of regression models by using this correlation information. The theoretical analysis shows that GcLearn has better prediction ability than traditional LR and SVR, and gives its applicable conditions. The experimental results show that GcLearn is effective. The proposed new methods include: the method of mining geometrical correlations between data of a variable, geometrical regression method at the level of curves, and the prediction method of using geometrical correlations.2. An adaptive learning method (autoDBN) is proposed to learn DBNs with changing structures from multivariate time series. autoDBN can learn a sequence of accurate model regions and DBNs with changing structures, which are adaptive to changing relations between multivariate time series. It overcomes the limitations, no special mechanism to detect model regions and blind searching, of existing methods. The experiment results show that its performance is obviously better than the existing methods. The proposed new methods include: the segmentation of time series by detecting geometric structures of time series; the finding strategies to find reasonable model regions; and the model revisiting method based on competition F-test to rectify possible errors of model regions and DBNs. 3. A DBN method based on correlations between geometrical patterns (Gp-DBN) is proposed, and can discover gene regulatory relations based on trend correlations. The experimental results on real gene expression data show that Gp-DBN is effective. The new techniques include: the geometrical pattern of time series of a gene is proposed to describe varying trends of expression levels of this gene; the method of using tangent vectors to represent features of geometrical patterns is proposed to gain discrete features of geometrical patterns, and to estimate potential regulators and time lags.4. The system evolution method (SE) based on a two-cluster geometrical model is proposed to estimate NC for PAM clustering algorithm. SE can estimate NC accurately under the difficult cases that there are small clusters near larger clusters and/or slightly overlapping between clusters. The experiment results show that it outperforms the existing methods on NC estimation. SE studies a cluster structure by examining separability of two closest clusters among all the potential clusters (twin-clusters), and a two-cluster geometrical model is proposed to analyze the separability of twin-clusters. Furthermore, it regards a dataset as a pseudo-thermodynamics system, and evolution rules based on energy relations of twin-clusters are proposed to estimate the optimal NC.
Keywords/Search Tags:geometric correlations between data, prediction regression model, geometrical pattern of gene expression, gene regulatory relation based on correlations between varying trends, geometric structures of time series
PDF Full Text Request
Related items