Font Size: a A A

Approximate Longest Common Subsequence Query Processing And Optimization On Biological Sequence

Posted on:2016-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:P K ZhengFull Text:PDF
GTID:2428330542955398Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of data storage and information technology,the data volume of biological sequence database is growing rapidly,and the difficulty of approximate query is growing.Although there are a variety of methods to measure the similarity of two sequences,the common method is to measure the similarity of two biological sequences with the longest common subsequence's length.However,the time and space cost of the longest common sequence of the two sequences are relatively large,especially in the massive biological sequences.Therefore,the direct application of the existing algorithm will inevitably affect the query performance,then designs performance superior and filtration effect excellent filtering algorithm is the best choice for approximate query processing of the longest common subsequence in biological sequence.This thesis summarizes and analyzes the longest common subsequence algorithms of two sequences,and according to the characteristics of these algorithms,they are applied to the verification process of the longest common subsequence query processing.Firstly,for the long sequence query processing problem,the basic filtering algorithm is designed.By using reverse-filtering and counting-filtering optimization strategy,the optimized BTC algorithm's filtration effect has significantly increased.The performance of the BTC algorithm is obviously improved after the application of the bit-parallel technology,and the time of the longest common subsequence query processing is reduced.Secondly,for the short sequence set query processing problem,the LCSIndex index structure is built on the short sequence set,and the filtration algorithm BRD is based on this index.By using right-shifting-control strategy and two-way simultaneous filtering strategy makes the BRD algorithm's filtration effect reached 90%,effectively reduces the size of filtering result set,and shortens the time of approximate longest common subsequence query processing on the short sequence set.Finally,through a lot of experiment on real data sets,the experimental results show that:for long sequences of approximate longest common subsequence query processing,the query processing of time by using the filtration algorithm based on BTC was significantly less than that of directly using the longest common subsequence algorithm;BTC filter algorithm increased ssignificantly through bit-parallel technology;for the short sequences set of the longest common subsequence query processing,the speed of using the BRD filter algorithm to conduct query processing is faster than using the longest common subsequence algorithm directly.
Keywords/Search Tags:biological sequence, approximate query, longest common subsequence, bitparallel, filtering algorithm
PDF Full Text Request
Related items