Font Size: a A A

Research And Implementation For Similarity Search Algorithm Of Biological Sequences

Posted on:2017-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2308330503968501Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Similarity search of biological sequences is the basis for the study of genomics and proteomics and its performance depends on the double sequence alignment. Based on heuristic method of local sequence alignment, FASTA and BLAST algorithm is fast and often used to solve the problem of similarity search, but it can not get the global correlation of the two sequences.This paper studies the optimization and parallelization of ant colony algorithm to solve the problem of double sequence alignment and then applies it in the biological sequence similarity searches globally.The main contributions of this paper are as follows:1, this paper proposes the optimization scheme of the SA-ACO algorithm.The optimization scheme includes optimization of the state transfer rule, optimization of mapping rule and the cache optimization. The scheme improves the sensitivity and speed of comparison. When the sequence’s length is 1050, the time ratio for the SA-ACO algorithm and ISA-ACO algorithm is 1.92; and the time ratio for Needleman-Wunsch alogorithm and ISA-ACO algorithm is 5.72; and the higher the sequence is,the higher time ratio is.2, this paper proposes the parallel scheme for the ISA-ACO algorithm.a) proposes a parallel scheme based on message passing model, designing the size of sub ant colony, migration period and migration content that can get better effectivity. By the exchange of the optimal solution, this model can not only get higher speedup, but also improve the quality of solution. The experimental results show that the average quality of the solution is close to the best and the parallel efficiency is 90.9%, when the average sequence length is 1050, the number of nodes is 2 and the size of sub ant colony is 9.b) proposes a parallel scheme based on shared memory model. Analysing the speed bottleneck of ISA-ACO algorithm, the paper proposes a parallel scheme based on shared memory to parallel two steps, “the search of comparison path” and “the update of pheromone”. The experimental results show that with eight threads,the speed ratio is 5.09,and the longer the sequence, the higher the speed ratio.3, this paper applys the ISA-ACO alogorithm to sovle the global similarity search of biological sequences and proposes the parallel scheme.a) proposes filter rules, which can screen out the sequences that are possible to be higher similar to the search sequence, and called effective sequences. So it can decrease the search time.b) proposes serial search-parallel comparison scheme, which is the combination of the two parallel schemes of ISA-ACO algorithm. The scheme can not noly get high speed ratio but also can improve the quality of the solution, so it is suitable to get higher sensitivity.c) proposes parallel search-serial comparison scheme, its performance depends on the distribution of different similarity sequences in the effective sequence.Based on the characteristics of ISA-ACO algorithm, the scheme introduces the dynamic load balancing strategy with the idea of evolutionary generation. This schema can effectively balance the load of each node, and improve the speed ratio and efficiency. With the search sequence’s length 750, the effective sequence size 500, the number of node 4, the standard deviation of time before introducing load balancing strategy is 19.82, and decreases to 4.85 after introducing; the speed ratio is increased from 1.67 to 2.67; and the efficiency is improved from 41.8% to 66.8%.
Keywords/Search Tags:similarity search of biological sequences, global sequence alignment, ant colony optimization, MPI, OpenMP
PDF Full Text Request
Related items