Font Size: a A A

Networks Based Research Of Pathogenesis And Classification Of Human Disease

Posted on:2012-11-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:H GuoFull Text:PDF
GTID:1114330371462908Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Various diseases have been threats to human's health and daily life even before human began walking upright. Disease may cause pain, dysfunction or death to person afflicted. Due to the great threat caused by diseases, the prevention, control, diagnosis, treatment and the pathogenic mechanism research are of vital importance. History has showed that the extraordinary increase in average life expectancy is due mainly to advances in medical progress.For many years, genetic study of disease has played a minimal role on disease clinical control. During the genomics and proteomics era, a large number of researches have produced huge amount of biomedical data by using -omics experiments on different diseases. These data have been utilized to study the pathogenesis, development and possible treatment of certain disease. By using system biology analysis to mine knowlege underlying biological data, a large number of genes, proteins and biological pathways were discovered associated with the disease phenotypes. These researches have provided disease mechanism clues and possible clinical diagnosis methods.Although great progress has been made in genomics and proteomics studies, there are still some problems to be solved. For example, some researches only focused on the pathological process of a specific disease and are difficult to extend relative research to other disease. The biological significance of the large number of diverse genes and proteins that were identified over-expressed by large-scale expression experiment is hard to interpret. Different analysis methods of biological data always have different results and often have few overlaps. Many biological experimental data have various biases which could cause high false-positive result. Cross-platform methods and analysis software need to be combined and integrated according to common standards.In order to solve the problems listed above, we designed and established a series of mathematical models and analysis strategies based on gene microarray expression data, protein interaction network data, gene regulatory networks data, biological pathway gene data and prior expert knowledge, and we have effectively identified a number of disease associated pathways and protein sub-networks. We utilized the sub-networks to predict the outcome of breast cancer with a high accuracy. The method can be applied to mechanisms research, diagnosis and prognosis of different diseases.First, in order to study the regulatory relationships between genes under specific circumstance, based on the gene microarray data, we adopted data reduction algorithms PCA and co-expression coefficient PCC to establish the new parameter FAB for measuring gene regulatory relations, and then we utilized this parameter as the feature for support vector machine (SVM) classifier and predicted the regulatory relationships between genes. The reason we choose the reduction algorithms to extract features of gene chips is because we found many methods dug regulatory relationships based on the raw data, which ignore the noise of microarray data and the interactions between genes. By using data reduction algorithms, we could extract the key information of gene chips and reduce the noise influence. We can also consider the relationship between genes by combining the gene co-expression correlation parameters: Pearson correlation coefficient. Finally, we input the feature parameter to the supporting vector machine classifier to predict the regulatory relationships. The predict results showed that by selecting appropriate number of features, we could obtain an accurate prediction of relationships between transcriptional factors and target genes. This research could suggest clues for studying regulatory relationships between genes under specific disease conditions.Secondly, aiming to study the disease development mechanism, we propose a novel strategy for identifying disease related biological pathways and genes. This strategy is successfully applied to the gene expression profile of Type II diabetes and smoking people. By integrating microarray expression data and biological pathway databases, we adopted non-negative matrix factorization (NMF) to analyze the difference of pathways activity level between healthy people and patients, and then used statistical test to identify significantly differentially expressed biological pathways of patients. By ranking the contribution weight of member genes of each pathway, we can identify important genes that affect the pathway activity mostly. Significant pathways and important gene that were identified by NMF could provide disease phenotype related information and suggest new clues for understanding the disease mechanism.Finally, we introduced a new strategy for disease diagnosis and prognosis, and tested the strategy on the breast cancer metastasis datasets. We first collected breast cancer related genes from OMIM and Cancer Gene Census database, then applied random walk algorithm to the human protein-protein interaction networks to identify breast cancer potentially related protein sub-networks, and then used the aggregate expressions of these sub-networks to predict the breast cancer metastasis by using support vector machine classifier. Results showed that this strategy could find the related subnet effectively and it has great advantage in terms of prediction sensitivity and specificity.In this paper, we systematically studied the molecules and molecular relationships related with human diseases from the aspects as gene, gene regulatory relationships, biological pathway and protein-protein interaction sub-networks. By integrating the biological expert knowledge, gene expression information, gene regulation information, structural information of biological pathways, large-scale human protein interaction information, we adopted data reduction algorithms, machine learning classification algorithms, network propagation algorithm and other data mining algorithms to identify the potential gene, protein or protein-protein interaction related with disease, and diagnose disease by subnetwroks. Our methods outperform the similar methods and improve the understanding of human disease pathogenesis from biological molecules, biological networks and even the whole biological system under disease condition. Our methods can also be easily applied to other disease.There are three main significances in this paper. First, we first defined the feature coefficient by combining Pearson correlation coefficient with the principle components that extracted by PCA. By using SVM classifier, this feature coefficient could improve the prediction accuracy, specificity and sensitivity. Second, we first applied the non-negative matrix analysis strategy to identify disease-related biological pathways and genes. Third, we first applied random walk algorithm to analyze the disease-related protein sub-networks and used the aggregate expressions of these sub-networks to predict disease prognosis. Results show that this method could improve the prediction performance in terms of sensitivity and specificity. All these parts can support each other and have strong versatility and scalability. These strategies can be applied to different diseases and provide important information to help the discovery of novel disease marker and drug target.
Keywords/Search Tags:Disease related gene, Gene regulatory network, Biological pathway, Protein-protein interaction network, Prognosis
PDF Full Text Request
Related items