Font Size: a A A

Research On String Dictionary Succinct Index And Algorithms

Posted on:2015-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:G Q ZhangFull Text:PDF
GTID:2268330428498411Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and the wide use of mobile devices, weare in the Age of Big Data. In this new age, more and more data need to be processed andstring text data becomes a significant part of it. How to efficiently store and index thelarge scale of string data becomes a new challenge. There are two categories of methodsto cope with the challenge. The first category is to compress the string data, making itpossible to store and process more data with the same amount of memory resources. Thesecond category is to design more efficient external data structures, improving thelocality of reference. All the string data is stored in external memory, and only selectivelyfetch a small fragment of string data into internal memory to support data access andsearch operations and the input/output operations between internal and external memoryshould be efficient.As the foundation of string index, string dictionary index is ubiquitously applied infields like geographic information systems, web search engines, information retrievalsystems etc. Large-scale of string data also become a problem for string dictionary index.This paper conducts research on string dictionary indexing problem at spacecompression and external indexing aspects. The main research works are as follow:(1) The existing string dictionary indexes cost too much space. To cope with thisproblem, a new string dictionary succinct index S-trie is proposed. S-trie notonly provides index for string dictionary to facilitate quick search operation,but also reduces the space consumption of string dictionary. Experiments showthat S-trie is competitive on space performance, the compression ratio can be30%of the original string dictionary.(2) The existing string dictionary succinct index can not efficiently work externalmemory environement, to cope with this problem. we adapted S-trie for external environment and proposed string dictioanary succinct external indexSB-trie. SB-trie has improved locality of reference and can work efficiently onthe disk environment. Just like S-trie in internal memory, SB-trie showscompetitive space performance. Experiments show that SB-trie not onlyconsumes much less space, but also has better search performance when thedata scale is relatively large.
Keywords/Search Tags:String Dictionary, Succinct Index, External Algorithm, Big Data Processing
PDF Full Text Request
Related items