Font Size: a A A

Graph And Network Based Learning Methods And Its Applications In Systems Biology

Posted on:2010-07-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z XiaFull Text:PDF
GTID:1118360302983061Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the on-going development of human society and technology, more and more data and information from different scales are accumulated. However, many problems are still unsolved and unclear to us. For example, the dramatic development of molecular biology just provided much knowledge of different parts of cells and does not conquer any complex disease such as cancer. We can not understand complex systems and handle them without integrating so many data and knowledge from the system view and investigating their internal correlation. We also need machine learning and data mining techniques to help us process the huge data automatically. Graph and network are natural ways to describe the correlation between different components from different scale of a complex system. So it is emergent to develop graph and network based learning algorithms to analyze systems from real application such as systems biology.The dissertation targets graph and network based learning algorithms motivated from real classification and systems biology. From the data integration view, it presents in-depth study of link based node classification, link prediction, subnetwork searching and graph match problems employing graph theory, statistics and optimization tools. The main contributions of the dissertation are as follows:1. We surveyed the graph and network based learning algorithms and introduced the definition and current development of systems biology. It was shown that graph and network is the core of systems biology and has huge potential applications in systems biology.2. Link based node classification on the graph is a semi-supervised learning method. In order to construct a semi-supervised kernel by spectral transform on a graph Laplacian matrix, we proposed an graph based algorithm to learn the optimal non-parametric spectral transform and construct a classifier simultaneously. The main ideal is to use the Fisher discriminate ration in the feature space as the common criterion of spectral transform graph kernel learning and classifier construction which can be maximized by a convex semidefinite programming. Compared with kernel alignment method, our algorithm integrated optimal kernel construction and classifiers learning rather than separating into two steps. We compared our method with the kernel alignment method on 7 data sets and find our method is competitive with kernel alignment,and outperforms its competitor in some data sets.3. We considered the drug-protein interaction prediction. Aiming at extracting information from node properties, link and unlabeled samples, we applied a manifold regularized semi-supervised learning method as well as constructing a new kernel which integrated the chemical structure of drugs, sequence information of proteins and the topology information of drug-protein interaction network. The proposed method improved the prediction accuracy on 4 data sets. Furthermore, some predicted drug-protein interactions by our method are confirmed by latest drug bank data set.4. From the view of systems biology, it is important to integrate protein-DNA and protein-protein interaction into the microarray data analysis. To identify the disease related gene functional module, we put forward a new network based regularization term to encourage the smoothness of the absolute values of the coefficients on the network. This new term was combined with (\ norm term which imposed sparsity to form a graph-based elastic network method. The group effect of the new method is analyzed mathematically. A new whole path optimization algorithm is also proposed to solve the proposed graph-based elastic network. The simulation results and theoretical analysis demonstrated our method can get better prediction accuracy. Finally we applied the new method to a gene expression data set of Alzheimer's disease and identified 4 gene functional modules which are related with the progress of Alzheimer's disease.5. Systems biology intrinsically required the integration of data from different scale. We considered the registration of three dimensional fluorescent molecular tomography revealing molecular functions and the three dimensional CT images containing anatomical structure information to get more information. However it is difficult to directly co-register the 3D FMT (Fluorescence Molecular Tomography, FMT) image of a small tumor in a mouse whose maximal diameter is only a few mm with a larger CT image of the entire animal that spans about ten cm. The exact coordinates of 3D FMT in the 2D flat images is known. So we proposed to co-register the two dimensional flat image and 3D CT images as the initial position for the final 3D registration between FMT and CT. which bridged the gap between the 3D FMT image and 3D CT image of the animal. During registration, two data points were obtained from the segmentation of the two objects. So the problem was casted as a data points matching which made the registration of images from totally different modalities feasible. We proposed two matching optimization methods by combining the global and local searching methods. One improved sequential monter carlo sampling method by incorporating least squares as a local search. Another combined differential evolution and improved simplex method endowed with least squares as a new local searching. A number of simulation results verified the advantages of combining local search and global search in reducing iteration number and finding better solutions. The visualization of the alignments of the 3D FMT and CT images through 2D pre-registration on two mice data shows promising results.
Keywords/Search Tags:Systems biology, Semi-supervised learning, Laplacian Matrix of a graph, Kernel methods, Drug-protein interaction network, Protein-protein Interaction network, Gene functional module, Biological image registration
PDF Full Text Request
Related items