Font Size: a A A

Biological Network Inference Based On Data Fusion

Posted on:2017-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:B TengFull Text:PDF
GTID:2348330488459950Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Rapid technological advances have led to the production of high-throughput biological data within different types. Each of these distinct data types provides independent and complementary information from a different view, which enable construction of complex networks via data fusion methods. In the post-genomic era, protein-protein interaction networks and disease-gene association networks are two key representatives among all the biological networks. Protein-protein interaction is of primary importance to understand protein functions. And with the advance of the Precision Medicine plan, the construction of the disease and gene association networks is becoming more and more important. This thesis proposed two data fusion methods for above two problems, respectively.To combine the results obtained from multiple affinity purification and mass spectrometry data, this paper proposes a method named Reinforce, which is based on rank aggregation and false discovery rate control. Under the null hypothesis that each ranking list is random, Reinforce method follows three steps to combine the original results from different sources. Firstly, Reinforce solves the problem of sample bias via data preprocessing. Secondly, rank aggregation methods are used to make the combined results more stable. Finally, Reinforce estimates the false discovery rate and reports those protein-protein interactions with high quality. The experimental results show that Reinforce can get more stable and accurate results than current methods on the single data.For the problem of the construction of disease-gene association network, this thesis presents the IGA algorithm. IGA assumes that disease genes can be divided into two categories:genes shared by multiple diseases and genes for individual disease. The structure of shared disease genes is modeled as a low rank matrix while the structure of specific disease genes for each disease is modeled as a sparse matrix. Thus the mining disease and gene relationship problem is transformed to matrix factorization problem. IGA employs two tuning parameters that control the size of the shared genetic pattern and the numbers of individual signals. The experimental results show that IGA can effectively explore the relationships between diseases and genes.
Keywords/Search Tags:Data Fusion, Biological Network, Rank Aggregation, Matrix Decomposition
PDF Full Text Request
Related items