The Research On The Longest Common Subsequence Query Algorithm

Posted on:2019-12-15

Degree:Master

Type:Thesis

Country:China

Candidate:L S Zhai

Full Text:PDF

GTID:2428330566989122

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The longest common subsequence problem is a classic problem in the field of computer science,which is used to return the longest common subsequence of multiple sequences,called MLCS.MLCS has important applications in gene detection,sequence similarity comparison,pattern recognition,data mining,code clone detection,document duplication and web clustering.At present,the algorithm based on dynamic programming and the algorithm based on the dominating point are usually used to solve the MLCS problem.With the increase in number and length of character sequences,the existing algorithms can not meet the actual requirements from the perspective of efficiency and scalability.This paper studies the MLCS problem,and the research content is as follows.Firstly,a new algorithm based on single hash is proposed,which is the longest common subsequence algorithm,SF_MLCS,which first generates successive table of each sequence according to the successive relationship between characters,and then represents successive table of all sequences using directed acyclic graph.On this basis,the subsequent relationship between characters is quickly accessed through hash.Compared with existing double hashing methods,the proposed single hash method improves the efficiency while reducing the memory space.through the deeply analysis of the existing algorithms,we find the problem of low efficiency and poor expansibility of the existing algorithms.Secondly,the longest common subsequence algorithm based on divide and conquer strategy,DC_MLCS algorithm,is proposed to solve the problem that the single hash method has high spatial cost and can not deal with long sequence.Representation of directed acyclic graph all subsequent sequence tables based on DC_MLCS algorithm firstly generates k layer vertex subgraph based on the subgraph,then each vertex with zero degree of output is processed one by one.in the treatment of each vertex,a vertex is deleted before treatment in order to reduce the space occupied space,low cost the system can deal with longer sequences.on this basis,the concept of the longest common subsequence of top-k is proposed,and the corresponding algorithm is designed and implemented to solve the longest common subsequence of all k sequences in a given n sequence.Finally,Experiments are carried out based on real data sets to verify the efficiency of the algorithm.

Keywords/Search Tags:

The longest common subsequence, divide and conquer, single Hash, top-k

PDF Full Text Request

Related items

1	The Research On Algorithms For The Longest Common Subsequence Problem And Variants
2	Explorations on the longest common increasing subsequence problem
3	Study On Parallel Algorithms For Longest Common Subsequence On Heterogeneous Cluster Computing Systems
4	Approximate Longest Common Subsequence Query Processing And Optimization On Biological Sequence
5	Parallel Algorithm For Multiple Longest Common Subsequence And Application Research On Hadoop Platform
6	LSS-DCA: A Fast Multiple Sequence Alignment With Divide-and-conquer Algorithm
7	Research And Implementation Of Ray Tracing Software With Divide And Conquer Accelerating Algorithm
8	Algorithms For Sorting By Short Swap And Longest Common Exemplar Subsequence
9	Divide And Conquer For SAT Solving
10	Divide And Conquer Strategy In Association Rule Mining