Font Size: a A A

The Research On Distributed Index Technology Based On Compressed Mixed Model

Posted on:2017-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:L WuFull Text:PDF
GTID:2348330482986864Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Reverse index is a key technology of distributed retrieval system which can effectively support result set sorting;however,it poorly adapts to retrieving phrases and sentences because all lexical items are obtained from original text set segmentation.Besides,traditional single index models can hardly bear high concurrent retrieval and the retrieval efficiency is thus seriously affected.Based on parallel computation model,how to design a distributed retrieval system with extensive adaptation and superior parallel retrieval performance has important research value.Around the above problems,the research content is as follows.(1)Specific to poor adaptation of inverted list,a compressed hybrid index model is proposed this paper.suffix array and inverted list are associated effectively through Hash algorithm combined with the advantages that suffix array supports sentence retrieval,vague phrases have strong adaptation and support inquiry modes under different contexts.In order to reduce index storage,compression algorithm is raised to reduce the space storage of the suffix array and pruning strategy is designed to lessen the associated quantity of the inverted list.Experimental comparison shows that compressed hybrid index model can effectively adapt to different retrieval modes,storage space is less than the inverted list,and the precision ratio is apparently higher than that of the inverted-list model.(2)On the basis of compressed hybrid index,distributed retrieval system is designed to support the request of high concurrent retrieval and maintain a higher precision ratio and retrieval speed.The system mainly consists of buffer module,index module,retrieval module and file system.Any two modules can realize low coupling and support each other.Index construction efficiently clusters original text sets and constructs a hybrid index for each category in virtue of parallel model.The nearest and most similar category can not only effectively narrow down the scope but also lower I/O cost and raise the precision ratio while retrieving the system.Under the same premise,the experimental results demonstrate that system modules can work harmoniously,and the adaptation and retrieval efficiency are superior to those of the inverted system.The distributed index system based on the compressed mixture model can provide high efficiency and adaptability.
Keywords/Search Tags:hybrid index, compressed suffix array, retrieval system, pruning strategy, precision ratio
PDF Full Text Request
Related items