Font Size: a A A

Research On Relevant Problems Of Gene Module Identification And Analysis

Posted on:2011-11-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:G G LiFull Text:PDF
GTID:1118330332487019Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of high-throughout biological chip technologies, a large number of biological chip data sets are produced. The analysis of different kinds of chip data can reveal different kinds of molecular networks such as protein-protein interaction networks, gene regulatory networks and metabolic networks. The researches on these networks represent that all these networks are composed of modules by which the biological functions are accomplished. Therefore, it is necessary to do some researches on the relevant problems about the identification and analysis of gene modules. Focusing on these problems, the main content of this dissertation are summarized as follows:1. The research on clustering algorithm. Although many clustering algorithms have been applied on the identification of gene modules, they still have some problems. One main problem is that most of them can only identify the positive relevant patterns, but not the negative relevant patterns or other more complex relevant patterns. Therefore, a linear manifold clustering algorithm based on the line manifold searching and fusing (LSAFCLUS) is proposed. The basic idea of it is to search the line clusters embedded in data and fuse some of the line clusters to construct high dimension manifold clusters. The algorithm conquers the drawback of conventional algorithms which need to set the number of clusters in advance, and overcomes the influence of noise and outer points. Thus it is suitable for the clustering of high dimension data with noise. The experimental results for simulated and real datasets demonstrate that LSAFCLUS is superior to other clustering algorithms in clustering accuracy and runtime. Otherwise, many gene modules with significant biological functions are obtained with the application of LSAFCLUS on two cancer gene expression datasets. The further analysis of the gene modules reveals some meaningful results. For example, C12orf35 and FAM26F may be the new discovered genes related to cancer.2. The research on similarity measure. Genes belonging to a gene module usually participate in the same biological process or have the same biological function, so that the profiles of them usually share similarities to a great extent. Thus, it is necessary to select a suitable similarity measure to represent the similarities between them. Four similarity (distance) measures are compared by studying their applications on gene expression data. Based on the comparison of the four similarity measures, two new similarity measures are constructed. The four similarity measures and two new similarity measures are compared on real gene expression data, and the results confirm that the new similarity measures are more suitable for gene expression data. Otherwise, the combination framework is developed to construct the combined similarity. The tests on different gene expression datasets demonstrate that the combination of multiple biological datasets is helpful for obtaining a result with more biological meaning.3. The research on the identification method of transcription regulatory modules based on Chip-chip data. Chip-chip data reflect the binding relations between regulatory factors and gene promoter. We developed a transcription regulatory module identifying algorithm based on the incorporation of gene expression data and Chip-chip data. The algorithm produces the core and coarse gene sets by using two different p value thresholds, and then analyzes the core and coarse gene sets, finally obtains the transcription regulatory with the extension of genes. With the application of the algorithm on two yeast datasets, some gene transcription regulatory modules with significant biological meaning are obtained. The comparison with other algorithms represents that the algorithm can not only identify some modules with more genes, but also some modules which can not be identified by other algorithms. The identified gene modules have different biological function, which are helpful for the understanding of yeast transcription regulatory mechanism.4. The building and analyzing of several simple motif models.The models of several simple motifs are builded and analyzed from different aspects. For a single input motif, the model building process is described in detail, and the approximate models are construted under different conditions. For a feed forward loop motif, its model is builded, and the dynamic performances of different kinds of feed forward loop motif are discussed. For an auto-regulation motif, its model is builded, and the distribution of its equilibrium points and the motif stabilization is discussed.5. The research on regulatory network based on modules.(1). The gene regulatory network construction method based on transcriptional regulatory modules is mainly composed of two parts: the upper construction of regulatory network based on modules and the down analysis of module motifs. The global compendium map of gene regulatory network is obtained by expanding the modules horizontally and vertically, and then the clear detail gene regulatory network is constructed through the down analysis of module motifs.(2). The principal differential analysis method based on chaotic ant swarm optimization is proposed aimed to the problem of the existing methods for the parameter estimation of differential equations. The method does not need the numerical approximation of the differential equation solution, and can solve the local extremum problem of parameter estimation.(3). The yeast gene regulatory network is constructed by adopting the gene regulatory network construction method based on transcriptional regulatory modules, and its model is based on differential equations of which the parameters are estimated by the principal differential method based on chaotic ant swarm optimization. The performances of gene regulatory network are analyzed through the differential equation model, which is helpful for the further understanding of yeast transcriptional regulatory mechanism and the deeper cognition of the characters of gene regulatory network.
Keywords/Search Tags:module identification, similarity measure, linear manifold clustering, gene regulatory network, transcriptional regulatory module, network motif
PDF Full Text Request
Related items