The Research Of Distributed Index Technology Based On Self-indexed Compressed Full-text

Posted on:2016-02-11

Degree:Master

Type:Thesis

Country:China

Candidate:Y T Liu

Full Text:PDF

GTID:2308330467982279

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Distributed full-text index, a core technology in the field of informationprocessing, has been widely used for competitive intelligence (CI),informationretrieval (IR), search engine (SE), information filtering (IF) and otherfields. A deep discussion on campus distributed full-text index technology has bothgreat theoretical and commercial value. With the growing popularity of the Internet,all kinds of data are being generated at a faster speed, with the total amount expandingexponentially. In the face of seas of data, relevant data index files continue to increasein size. Thetraditional single index system cannot, basically, meet the requirementsfor indexing massive data, while the distributed index system is able to satisfy suchrequirements and achieve a distributed index for massive data. The core technology ofa distributed index system covers: the creation of the index, data distribution andload-balance of distributed index, and index query. In this paper, the compressedfull-text indexâ”€an text processing technology that is very popular in recent yearsâ”€has been applied to the distributed index system, and the query strategy under thisindex structure has been discussed as well.Contents and innovations for the distributed full-text index technology studied inthis paper include:(1) At present, the reverseindex has been adopted for structuring a majority ofdistributed index systems, for its response to query can reach millisecond level whenrunning on high-performance clusters. The reverse index, however, has to store notonly the information of itself but also additional information, and the additionallystored information is used to support SEs to support functions such as extraction ofstored segments, ordering and positionalinformation, and query cache, etc. As a result,the utilization efficiency of the storage space is relatively low. Originally, this paperhas integrated the compressed full-text self-indexâ”€a hotspot for text index researchesâ”€into our distributed index system, and proposed a wavelet tree compressionalgorithm based on the improved Huffman coding in combination with the applicationof suffix array, achieving an adaptation to the compressed self-index structure under adistributed environment as well as to the corresponding efficient creation algorithm.(2) An index system for a SE can play following roles: first, create the index forweb files according to certain rules for following queries, and; second, retrieve theindexed files as queried by users, and rank the indexed files according to certain rules and return the result to users. A query strategy under the distributed environment hasbeen proposed on the basis of the improved compressed self-index structure.(3) A framework of the distributed full-text index system has presented withconsideration of above contents and relevant research achievements. The system is infavor of the distributed index for unstructured data of different types, and, therefore,achieves the query and index of a sea of unstructured data. This paper also makes adetailed description to the systemâ€™s index cluster and the design of the query cluster.Finally, the efficiency of the query processing of this distributed index system hasbeen tested.

Keywords/Search Tags:

distributed full-text index, Compressed Self-Indexed Text, wavelet tree, suffix array, query strategy

PDF Full Text Request

Related items

1	Study On Algorithms For Compressed Full-text Self-indexes
2	A Full-text Indexing Model Based On Suffix Array And Posting List
3	Study On The Algorithm For Full-Text Self-indexes Compression
4	The Research On Distributed Index Technology Based On Compressed Mixed Model
5	The Compaction Of Full-text Indexing Structures And Its Applications
6	Research On Key Technology Of Distributed Full-Text Index For Web Information
7	Research On Compressed Full-Text Indexes
8	Massive Data Storage And Full-text Search
9	Research On Full-Text Index Model Based On Full-Text Database
10	Research And Implementation Of Real-time Compressed Text Index Technology