Font Size: a A A

Research On LCS' Level Index Construction Algorithm

Posted on:2019-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:S P YangFull Text:PDF
GTID:2428330566488929Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The longest common subsequence(LCS)is the longest sequence of the same elements that is obtained by removing zero or more elements from two sequences.The application of LCS includes similarity text detection,gene sequence alignment,etc.In this paper,we study how to obtain the longest common subsequences of two sequences,and return their length quickly.The specific research are as follows.First of all,by analyzing the existing algorithms,we find that they all have problems of low efficiency and poor scalability.Aiming at these problems,a hierarchical index is proposed and the corresponding index construction algorithm named LEVEL-LCS is proposed.The algorithm firstly builds inverted index for the longer one of the two sequence,then constructs hierarchical index with the shorter sequence.By traversing the index from the bottom up,we can enumerate all the longest common subsequences and their lengths of the two given sequences.Compared with the traditional dynamic programming method,the calculation and storage of non-essential matching characters pair are reduced,whitch make the algorithm of LCS more efficient and make the storage space more economize.Secondly,in order to further reduce the index space,an efficient pruning strategy was proposed and the corresponding index construction algorithm named LEVEL-LCS+ was designed.The optimization algorithm replaced the repetition nodes whitch is adjacent in the hierarchical index with a unique node,whitch further reduced the storage space and improves the efficiency of the enumeration process.Sometimes,we only want the length of the longest common subsequence,so algorithm named LENGTH-LCS is proposed,and that's reduced the spatial complexity of the algorithm to linear.Finally,we made multi-group experiments based on both real data sets and artificial data sets,the experimental results are compared in terms of building index time,index occupation space size,and whitch verify the efficiency and expansibility of the method proposed in this paper.
Keywords/Search Tags:the longest common subsequence, string matching, hierarchical index
PDF Full Text Request
Related items