Font Size: a A A

Parallel Association Search Algorithm For Biological Networks

Posted on:2019-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2370330566998621Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of biomedical data,the demands for searching biomedical data increases dramatically.More specifically,a lot of queries request not only the return of biological terms but also how these terms are correlated with each other.However,such specific needs cannot be well addressed using traditional search engines.Supported by National 863 project "the Key Technology Research and Development of Biological Data Expression Index,Search and Storage Access",we propose to address this issue with parallel programming and graph mining.Given a set of keywords that matches to a user's query,the multi-keyword association search attempts to search upon a biological network for the association of multiple keywords,and to return the relevant knowledge that can be used to assist tasks such as disease diagnosis and drug discovery.The core technology of this project is to build a multi-keyword association search algorithm and apply it to biological networks in which biological semantic relationships are curated.The association search task can be reformatted as the “Steiner problem in networks”(SPN)algorithm.However,the time complexity of SPN is NP-hard and thus no longer suitable for large-scale biological network search.In literature,efforts have been spent to significantly reduce the running time by dividing the task into an offline phase and online phase.In this project,we take advantage of the previous work and design a Parallel Association Search algorithm based on Center Node(PAS-CN algorithm)based on the Spark computing framework to further increase the efficiency of association search.Biological networks are constructed based on the relationships between biological entities.Constructing an approximated minimal Steiner tree facilities exploring the associations among a set of keywords in a biological network.Our solution has an offline phase and an online phase.In the offline phase,a hierarchical clustering is applied on a biological network,followed with network division,selection of sub-network center node,and construction of the lowest common ancestor matrix of every node.In the online phase,a parallel association search algorithm named “parallel association search algorithm based on center node”(PASCN)is developed to greatly reduce the time complexity of SP N.Besides,PAS-CN successfully limits the size of the Steiner tree by introducing the network center node.PAS-CN consists of three parts: sub-network division,parallel Steiner tree search,and local Steiner tree merger.Among them,sub-network division and local Steiner tree merge part are completed by Spark platform serial calculation,and the parallel Steiner search part is solved in parallel on Spark cluster.Experiments show that the PAS-CN algorithm has achieved a great improvement in the computing time and the limit of the size of the Steiner tree.
Keywords/Search Tags:association search, parallel, steiner tree, biological network, hierarchical clustering
PDF Full Text Request
Related items