Font Size: a A A

An Optimal Implementation Of SA-IS Algorithm In External Memory

Posted on:2015-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:W DengFull Text:PDF
GTID:2298330422977197Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Suffix array is a powerful tool to structure full-text index, which is consisting ofall the suffixs in a string in their lexicographic order. Compared to the suffix tree, theindex structured by suffix array always have a smaller occupied in their constructionwith a high speed. It has wide application in the field of Internet information searchand the study of bioinformatics.Since1990U.Manber and G.Myers[1]proposed the concept of the suffix array, inrecent years, a great progress has been made in research on suffix arrayconstruction algorithm. Now there are a lot of suffix array construction algorithms,which always have a linear time complexity. KSP[2]、KA[3]and SA-IS[4]are classicalgorithm of them. However, with the increase of the data size, traditional suffix arrayconstruciton algorithms always have the limitations by the memory, which is meansthat the traditional suffix array construction algorithm can’t sort the large-scale of datawhich is beyond the memory capacity. So the study of technology which structure thesuffix array in external memory is necessary.In this paper, first, we introduce an Implementation technology of SA-ISalgorithm, which makes SA-IS algorithm can run in external memory. But thisImplementation is not good enough for its large I/O in external memory. For this, inthe rest of this paper we analysis the problem which cause the large I/O in formerImplementation, and give our optimaization for these problem. We found that there isa mount of redundancy sorting when the Implementation program handle the externalblock. So we proposed a new Induced-sorting of former Implementation by divide SAblock into smaller area. We also change the former process of reference PCI, which isthe bottleneck of former Implementation. Through our optimaization, the program hasa faster speed which cause even smaller I/O. We prove our optimatization has a largeimprovement in I/O and runing time by expriments intruduced in chapter4.
Keywords/Search Tags:Suffix array, External memory, linear complexity, Optimaltion
PDF Full Text Request
Related items