Font Size: a A A

Engineering Implementation And Performance Evaluation Of An External Memory Suffix Array Construction

Posted on:2015-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LiFull Text:PDF
GTID:2298330422477198Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The suffix array is originally proposed as a space-efficient alternative to thesuffix tree and have been widely used as an important index data structure in thedomains of bioinformatics, full-text indexing, string matching, frequent string mining,sequence analysis, cluster analysis, etc. So far, the researches on the suffix arrayconstruction algorithm have been fruitful. With the development of informationtechnology, the dataset’s size keeps ever-growing in many fields. The space requiredby constructing suffix arrays for a very big dataset dramatically exceed the RAMcapacity of the commodity computer. In this case, these RAM-based suffix arrayconstruction algorithm is no longer applicable and the scope of application of thesuffix array has been greatly limited.In this paper, we discuss an external-memory suffix array construction algorithmthat is based on SA-IS algorithm. We analyze the main idea of the algorithm andtheoretically prove its correctness and feasibility. On the basis of in-depthunderstanding of algorithms, we use process-oriented programming methods toimplement this algorithm. Besides, we streamline the auxiliary data structure andimprove external sorting process to ensure the performance of the algorithm duringthe specific implementation process. At last, we evaluate the performance of thealgorithm through experiments. As a result, this algorithm is a linear-time algorithmand its performance is closed to EM-SA-DS algorithm.
Keywords/Search Tags:suffix array, external me mory, construction, SA-IS
PDF Full Text Request
Related items