Font Size: a A A

Computational Systems Biology Study Of Cancer Biomarkers And The Development Of Biological Network Visualization Software

Posted on:2011-12-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:T X HuanFull Text:PDF
GTID:1118360305450180Subject:Developmental Biology
Abstract/Summary:PDF Full Text Request
Background and PurposeIn recent years, the large-scale bio-molecules interaction data, especially the protein-protein interaction data produced by high-throughput techniques, are published. It accelerates the progress of disease related biological research. Especially, in the cancer research, biologists begin to focuse on the dynamic changing of network disturbed by pathological factors. Meanwhile, characterizing diseases in the way of observation molecular underpinnings of many disorders were recommended as new perspective for human disease classification.Effective cancer protein biomarkers can help improve cancer diagnosis and treatment strategies selection in response to different drugs, and also can help to classify disease subtypes or facilate prognosis. Furthermore, biomarkers can be used for disease classification in a new perspective. However, developing cancer biomarkers has been a century-old grand research challenge. Since 1847, the United States Food and Drug Administration (FDA) have approved only 9 proteins so far for cancer diagnostic applications. Recent years, in the background of systems biology, the hope for cancer protein biomarker discovery rests in the fact that cancer signaling, especially that of cancer invasive and metastasis, which could trigger profound immunological and inflammatory responses with unique cancer molecular footprints.In this research, we focused on the computional systems biological methods for cancer protein biomarker discovery.1) We, for the first time, systematically surveyed of a set of curated candidate cancer protein biomarkers and evaluated their characteristics in the context of the global relationships among cancer phenomics. Our study provided heuristics to both guide future computational models for cancer biomarkers prediction and screen out the biomarkers for its potential clinical usage.2) Disscusing the algorithms of cancer protein biomarker discovery. We proposed a new method for prioritizing cancer biomarker candidates, by taking advantage of the information of cancer phenotype association to guide the random walks steps in a global human PPI network. Compared with traditional ranking methods, our method was better in both sensitivity and accuracy.3) Furthermore, we studied AFP-one of the most important biomarkers of Hepatocellular carcinoma (HCC). By comparing the gene expression profiles of AFP +/- HCC clinical samples, we identified 196 differential expressed genes and concluded the possible mechanism differences among the dfferent subtypes of HCC related to AFP.4) We developed ProteoLens visual analytic software tool for creating, annotating and exploring multi-scale biological networks. Supporting direct database connectivity and full SQL, makes it suitable for bioinformatics expert data analysts who are experienced with relational database management to perform large-scale integrated network visual explorations. Our overall objective is to foster within the biomedical research community, effient integrating huge fucntioal genomics data for cancer biomarkers discovery; by desgning the data mining platform of systems biology to bring the computional experts join in the cancer research effeiently.Methods1 Data collections1) Disease genes, Cancer genes, Cancer protein biomarkers, Essential genes, Drug target proteins;2) PPI network, TF-gene regulation network, Pathway, gene co-expression network;3) Tissue specific genes, tissue gene expression data, Gene Ontology data, human plasma protein/peptides data;4) Molecular ID map data, Disease official terms, molecules annotation data;2 Cancer protein biomarker characteristics analysis1) Comparision of PPI network topological characteristics;2) Comparision of tissue expression specificity;3) Comparison of expression range; 4) Comparison of Gene Ontology keywords;5) Co-Pathway stating;6) Co-expression stating;3 Constructing cancer phenotype specific molecular network1) PPI subnetwork;2) TF-gene regulation sub-network;4 The definition and comparison of cancer phenotype association network (CAN)1) CAN based on cancer disease genes;2) CAN based on cancer protein biomarkers;3) CAN based the disease associated loci from GWAS;4) Constructing adjacent matrix and similarity score to compare different CANs;5 SW-RWR algorithms and its assessment1) Coverage;2) Senesitivty;3) Accuracy;6 Analysis of AFP molecular evolution1) Buiding molecular evolutionary tree;2) ALB-like domain analysis;3) Analyzing intergenic regulationary region;7 Analyzing the expression profiles of AFP+/- HCC clinical samples1) HCC gene expression profiles collection;2) Meta-analys is;3) Comparison of Bayes network and PPI network;4) Gene functional analysis;8 The design and implemention of ProteoLens--Network visualization tool1) Fundamental framework;2) Java implemention;3) Systems biological case studies。 ResultsThe first partAnalyzing the characteristics of cancer protein biomarkersIn order to evaluate the global characteristics of cancer protein biomarkers, we chose disease related aberrant genes (called disease genes for short), cancer related genetic abnormal genes (called cancer core genes for short), drug target proteins and essential proteins as reference.1) A little intersection among those kinds of data sets;2) Candidate cancer protein biomarkers had less PPI numbers than cancer genes and essential proteins, but more than common disease genes and drug target proteins on average;3) Cancer protein biomarkers included more tissue specific genes than the common proteins;4) Cancer protein biomarkers were trend to be chosen widely expressed genes, their average expressed tissues were even more than the essential genes;5) There are 385 GO terms, including 26 cellular component terms,0 molecular function terms and 359 biological process terms, considered to take higher percentage in cancer biomarker set than in common protein set (The details do not show here);6) The density of cancer protein biomarkers surrounding the genetic aberrant genes:the cancer protein biomarkers coverage density inclined dramatically when choosing more specific mechanism relating molecular data sets. And compared to all the proteins distribution ratio, the biomarkers trend to be chose less than the 3 steps distance to their corresponding disease genes;7) The cancer protein biomarkers and corresponding cancer core genes were twice more probabilities to be in the same pathway than the random set on average. Building up cancer disease association network based on cancer protein biomarkers (DBN) The network nodes are cancer subtypes and network edges are represented if and only if two linked diseases share at least one candidate cancer protein biomarker. There are 762 cancer protein biomarkers mapping to 59 cancer phenotypes, including 820 "biomarker-disease" associations. There are 8 protein biomarkers shared by two cancer phenotypes on average. All the 59 nodes formed a connected network, no orphan nodes or subnetworks.Comparison of DBN and DAN (Disease association network based on disease genes)There are visible similarities between the two association networks. Protein biomarkers can substitute genetic aberrant genes to describe the associations among the cancer phenotypes.The second partSW-RWR AlgorithmSW-RWR based on Random Walks Ranking (RWR) algorithem and it improved RWR by using the information of cancer phenotype association to assign each gene a disease-specific weighted value to guide the RWR algorithm in a global human PPI network. SW-RWR is especially suitable for cancer protein biomarkers'prediction and prioritization.Prioritizing leukemia protein biomarkers by SW-RWR1) Leukemia is connected to 44 cancer phenotypes in DAN and we chose top 5 cancers by similarity score for further computational model building;2) Leukemia and these top 5 cancers are related to similar local network disturbance in the PPI network;3) The top 20 candidate protein biomarkers selected by SW-RWR and they present their associoations with the molecular mechanisms of leukemia;4) SW-RWR outperformed a typical local network-based analysis in coverage and also showed better accuracy and sensitivity than the original RWR method in leukemia protein biomarkers discovery. Prioritizing lung cancer protein biomarkers by SW-RWR1) Leukemia connects to 44 cancer phenotypes in DAN and we chose top 5 cancers by similarity score for further computational model building;2) Leukemia and these top 5 cancers are related to similar local network disturbance in the PPI network;3) The top 20 candidate protein biomarkers are selected by SW-RWR and they present their associoations with the molecular mechanisms of leukemia;4) SW-RWR outperforms a typical local network-based analysis in coverage and also showed better accuracy and sensitivity than the original RWR method in leukemia protein biomarkers discovery.Prioritizing lung cancer protein biomarkers by SW-RWR1) Building disease association networks based on GWASs:We created 3 different disease association networks (DAN), by defining these associations as sharing common DASs (DAN1), common genes (DAN2) or common chromosome regions harboring DASs (DAN3).2) Retrieving a lung cancer centered sub-network of the DAN:there were five diseases connected lung cancer, idiopathic pulmonary fibrosis, chronic obstructive pulmonary disease, glioma, coronary disease and cardiovascular diseases. The subnetwork showed that these diseases were of some kind kindred with lung cancer. These disease genes were connected by direct or indirect PPIs, which imply that there were some implicated associations among those diseases'pathological mechanism.3) The top 20 candidate protein biomarkers selected by SW-RWR and they present their associoations with the molecular mechanisms of lung cancer;4) SW-RWR outperformed a typical local network-based analysis in coverage and also showed better accuracy and sensitivity than the original RWR method in lung cancer protein biomarkers discovery.The third partThe molecules evolution of AFP and ALB-like gene family 1) The evolutionary tree show the concurrent evolution of the species and ALB-like protein family. As seen from the evolution branch of AFP, chick AFP is of less similarity with human, dog, rat and mouse; but of higher similarity with its paralogy-chick ALB. Rat AFP and mouse AFP are of high similarity ratio; human AFP is more similar to dog AFP. There are 1 ALB-like protein family member in fish, 2 in frog,3 in bird and 4 in mammalian.2) Protein domain analysis. There is an ALB-like protein of seven ALB domains in lamprey. The ALB domains from the same proteins are of fewer similarities than from paralogy proteins, it means duplication events are in the level of proteins other than domain. The first and the second domain of ALB-like protein family members are more similar than the third domain. It means the duplication might come from two-domain proteins originally and later produced the 3 domain protein by partly duplication event.3) Intergenic space sequence analysis. In five model species, AFP enhancers E1, E2 and E3 exist and of high similarity with human.Identify differentials between AFP+ and AFP- hepatocellular carcinoma clinical samplesWe first selected 196 highly differentially expressed genes from these gene expression profiles by meta-analysis. We found the selected genes had more proteins interacted with them than average genes did, through overlaying each genes into a human PPI network. Then we used Bayesian network (BN) algorithm to infer the causal reliable probabilities between those genes, which shows that the genes closer to AFP in the BN are all linked together and can form a connected sub-network in the human PPI network. Both the Gene ontology (GO) analysis and pathway analysis showed that AFP+ and AFP- HCC clinical samples had significant differentials in cell proliferation, cell mobility, immune response, WNT and some signal pathways.The fourth partProteoLens framework ProteoLens is a standalone software tool written in Java programming language. Its software architecture consists of two separate functional layers-a data processing layer at the backend and a data visualization layer at the frontend-connected by a network data association engine。The data processing layer is the place where biological data from different sources, including flat files, XML data and tabular data in relational databases. The data visualization layer is the place where specified network data attributes and data association rules are converted to network layouts and network visual properties. An association rule establishes the mapping between data in the data processing layer and data in the data visualization layer. This design enables users to navigate between data management and data visualization iteratively until useful insights from the proper visualization are established.New Features and core functionalities of ProteoLensThe core functionalities of ProteoLens:1) ProteoLens supports several types of physical data sources:tab-delimited text files on the local file system and tables/views in relational tables managed by Oracle lOg or PostgreSQL 8.x database management systems.2) SQL-based visual data analysis:suport full SQL statements including Data Definition Languages (DDL) and Data Manipulation languages (DML); 3) Flexible network data visual anotation; 4) Sub-network manipulation; 5) Multiple network layout methods.Case Studies1) Human Cancer Association Network drawing by ProteoLens;2) Compound-Target Interaction Network drawing by ProteoLens;3) Peptide-protein Mapping Networks drawing by ProteoLens.Conclusions1. We collected comprehensive diseasome, genome, protome and interactome data as much as we can, in order to build up a system framework to analyze the system characteristics of those more than 1000 potential used cancer protein biomarkers. The analysis results would be used to support computional methods for biomarkers discovery in the future research. 2. Built up disease phenotype asscociation network, whose associations were defined as two phenotypes sharing at least one common protein biomarkers. Quantified the relationship between cancer protein biomarkers and cancer disease genes, and bridged the correlations among biomarkers and cancer pathological mechanisms.3. First, take an advantage of the concept of "Disease associoation network" for cancer protein biomarkers discovery; and design a new algorithm——SW-RWR, for the prioritization of candidate biomarkers. SW-RWR outperformed RWR in both sensitivity and accuracy.4. We first take an advantage of GWAS information for the prediction and prioritization of cancer protein biomarkers. The framework of this study will foster the coorperation of researchers in different fields of biomedicion.5. Identified and validated the significant molecular differentials between the AFP-positive (AFP+) and AFP-negative (AFP-) HCC clinical samples. Selected 196 highly differentially expressed genes, those selected genes had more proteins interacted with them than average genes did. And they play important roles in cell proliferation, cell mobility, immune response, WNT and some signal pathways.6. The design and development ProteoLens. ProteoLens is a JAVA-based visual analytic software tool for creating, annotating and exploring multi-scale biological networks. Comparing with traditional biological network visualization softwork, the new features of ProteoLens is below:1) Full support rational database, such as Oracle and PostgreSQL, and support embedded SQL; 2) supports graph/network represented data in standard Graph Modeling Language (GML); 3) The architectural design of ProteoLens enables the de-coupling of complex network data visualization tasks into two distinct phases. The architectural design of ProteoLens makes it suitable for bioinformatics expert data analysts who are experienced with relational database management to perform large-scale integrated network visual explorations.
Keywords/Search Tags:Cancer Biomarker, Systems Biology, Biological network visualization sftoware, Hepatocellular carcinoma, Alpha-fetoprotein
PDF Full Text Request
Related items