Font Size: a A A

Knowledge Based Disease Association Analysis And Auxiliarymechanism Deciphering

Posted on:2019-09-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:W J XuFull Text:PDF
GTID:1360330542497363Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
The disease-association network is a global map that objectively describes the disease relationship and can assist drug repositioning,inferring etiology,complication analysis,and inferring disease mechanisms.In recent years,research on disease association analysis based on biomedical knowledge bases has continued to emerge,and the selected disease property knowledge source determines the observational view in which the disease association analysis is conducted.The existing disease-relational network data can be divided into multiple research views.In order to obtain a comprehensive and objective disease-relational network,it is necessary to integrate the disease networks of different views into a unified network.In addition,crosstalk between disease pathways also reflects the association between diseases,but disease association studies based on pathway crosstalk models have not been reported.Finally,the disease related gene network is a network display format of the disease mechanism model,but there is currently no easy-to-use tool to resolve its function at the level of pathway knowledge.The study in this thesis is divided into two parts.The first part explores the multidimension data fusion method of the existing human disease-disease network,and proposes a method of constructing a disease association network based on pathway crosstalk;the second part builds a pathway-level disease gene network analysis tool based on the network fingerprint framework and pathway knowledge source,and uses it for disease gene network mechanism deciphering analysis.In order to explore the multi-dimensional data fusion method of existing disease association networks,we first propose to the problem of the integration of multiple disease networks into a subsititutive problem of the fusion of multiple similar networks.After analogy with the similarity network of patients,the similarity network fusion(SNF)based disease network integration method is proposed as the solution of the multidimensional network fusion;then from the set of existing disease-associated network,three similarity networks molecular mechanism based HDN(M-HDN),biological process based HDN(B-HDN)and symptom based HDN(S-HDN)for 223 MeSH diseases were obtained.Then three networks were subjected to data preprocessing and the SNF network fusion method to generate a multi-view fusion disease similarity network,named as mvHDN.22 disease communities were obtained by clustering on mvHDN,the community structure in the mvHDN similarity matrix and the community structure in the first 3 similarity matrices are different.Based on the joint analysis of community and edge contribution,the number and proportion of the seven types of edge contribution types within the community are compared,and then the number and proportion of various types of edge contribution types in the community are compared.The results show that the edges contributed by B-HDN accounted for the largest proportion in the internal edges of the cluster.The internal edges of the cluster accounted for the larger proportion in all edges contributed by S-HDN.Next,the cluster classification and the disease classification label were compared,and the disease community of mvHDN was generally in consistent with MeSH disease classification but 2/3 of the communities have different MeSH codes at the same time.From the distribution of each disease categories in clusters,we can see that certain disease categories can monopolize mvHDN disease communities,and other disease categories are scattered among different disease communities.Finally,some new disease relationships were discussed based on mvHDN.In order to build a pathway crosstalk-based disease association network construction method,a new definition of the pathway crosstalk model was proposed.The "Human Diseases" data set was extracted from the KEGG pathway database.After data preprocessing,67 important disease pathways were obtained,and 102 Basic biology pathway were obtained at the same time.After searching for the crosstalk edge of the basic biological pathway,the result suggested that 1/5 of the edges belong to the crosstalk edge;then 4 topological indices suggested the topological properties of the basic biological pathway were significantly different from the non-crosstalk edges;then the distribution of crosstalk edges in the path were analysed,crosstalk edges and crosstalk genes tend to be concentrated in a few of the mostly crosstalked pathways.The crosstalk of the disease pathways were analyzed in the same way.It was found that more than 1/4 of the edges are crosstalk edges;the topological properties of crosstalk and noncrosstalk are significantly different;the crosstalk edge and crosstalk genes are more concentrated in a few crosstalk pathways than that in the case of basic biological pathways;then the crosstalk edge type distributions of the basic biological pathways and disease pathways are analyzed;finally the crosstalk network of the disease pathway categories is discussed through crosstalk analysis between the pathway categories.In order to provide biologists with tools for network fingerprint-based pathway analysis of disease gene network,the network fingerprint framework was first extended,network alignment algorithms were added,and four network alignment scoring methods were included;the source of the reference network database was expaned to include 766 reference pathways and 49 reference pathway datasets.The NFPscanner,a visualization and gene network analysis tool was designed and developed with Java and R.It allows users to perform fingerprint analysis of gene networks through a browser,providing simultaneous comparison of multiple network fingerprints and network comparison analysis function;using the KEGG disease pathways as experimental data,it was proved that the network fingerprint analysis algorithm was basically consistent with the standard tool KOBAS pathway enrichment results;then the neonatal sepsis related gene network and the obesity-related metabolic gene network were analyzed using the NFPscanner.The fingerprints results were put side-by-side for comparison and revealed that the fingerprints of the neonatal sepsis gene network and the obesity-associated metabolic gene network showed a significant difference in the pathway correlation levels;an opensource R package NFP was also developed to provide a suitable tool for bioinformatician to do large-scale network fingerprint analysis.The functions of NFP can be called to calculate the fingerprint of the query network and generate a graph of the network fingerprint;finally,as a case study,the breast cancer-related FOXM1 pathway network is analyzed to demonstrate the usage of NFP.The main innovations of this thesis are as follows.Firstly,a multi-dimensional disease association network fusion solution based on the multi-view fusion method is proposed,and the similarities and differences between the disease association network and the MeSH disease classification system are compared.Secondly,the diseaseassociation analysis network is constructed based on pathway crosstalk theory and pathway knowledge source KEGG,and also the cross-talk relationship between disease categories were discussed.Thirdly,it develops a network fingerprinting framework,adds network alignment algorithms into the network fingerprint basic framework,and makes the mature network alignment algorithms obtain new usage.Finally,the design and development of a pathway analysis tool for disease mechanisms based on the network fingerprint framework can be applied to the on-line analysis of disease gene network data.
Keywords/Search Tags:Disease network, similarity network fusion, disease pathways, pathway crosstalk, network fingerprint
PDF Full Text Request
Related items