Font Size: a A A

Integrative Biological Network Clustering Using Semi-supervised Graph Clustering

Posted on:2019-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:H S XueFull Text:PDF
GTID:2428330566998861Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With rapid development of high-throughput sequencing technology,large-scale biological and medical data is generated exponentially.Bioinformatics is experiencing a data revolution.In post-genome era,biological data has interactive connections,which can be described as complex association networks.It is difficult for scientists to locate valuable information quickly and accurately in multiple networks.The goal of this project is to excavate existing large-scale networks and identify effective data from large-scale biological networks.Integration and partition of multiple biological networks is not a well-studied topic and there are only few publications in the literature.Distinctively different from general networks,large-scale biological networks may share mutual associations.Therefore,in the process of integration and partition,we cannot ignore the relationships between different networks.On the other hand,it is essential to decrease the networks' dimensions effectively while maintaining the relationships between biological networks since the large size of biological networks.In this paper,we design an integrative biological network clustering framework by using semi-supervised graph clustering according to the features of complex relationships and networks' large dimensions.Integrative biological network clustering framework is a DNN structure with Sparse Auto Encoder(SAE)and Semi-Supervised Sparse Auto Encoder(semi SAE)as its building block.The first layer of DNN is SAE which is used for extracting constraints.semi SAE is used from the second layer to the end.Input of semi SAE contains sparse matrix and constraints.The dimension of input networks decreases constantly with the extension of whole iteration model,and the multiple biological networks trend to converge with effect of constraints.Entire process of the model can reduce the dimension of multiple networks.The key of clustering framework is semi SAE.It represents new lower dimension representation of input networks and extracts constraints as the input of next layer.It combines the loss function with input constraints to train the model,and get a new representation which merges prior information of previous layer.In the end,for the final output networks with prior information,our model uses Clusterer Ensemble to obtain a single network and applies k-means algorithm to get final clustering results.To verify the effectiveness of clustering framework and semi SAE,we use nine different gene expression networks of Arabidopsis thaliana as our experimental datasets.Evaluation criterion of our algorithm contains Matrix Similarity,Silhouette Coefficient,Gene Ontology Enrichment Analysis and KEGG Pathway Enrichment Analysis.The result of experiments demonstrates that our clustering framework and semi SAE have prominent advantages in learning network features and reducing the dimensions based on prior information.
Keywords/Search Tags:bio-network, semi-supervised algorithm, graph clustering, autoencoder
PDF Full Text Request
Related items