Font Size: a A A

A Full-text Indexing Model Based On Suffix Array And Posting List

Posted on:2015-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:P F GuoFull Text:PDF
GTID:2268330425976198Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Full-text retrieval system supports quick retrieving information from the massive text data, which has important application value. Full-text indexing model is the core of full-text retrieval system. It determines what functionality and performance that the full-text retrieval system can provide. The design of full-text indexing model is an important issue.The performance evaluation criteria includes query time, construction time and storage space; the functional evaluation criteria include self-index, rank query, phrase query and word boundary undetermined language adaptability.The inverted index model has fast query speed and small storage space, support the rank query, but it cannot support the phrase query well, unable to adapt to the boundary undetermined language well such as Chinese. Suffix tree and suffix array indexing model support the phrase query and self-index, also support word boundaries undetermined language, but do not support rank query. The ST-PL and CII indexing model combine with the advantages of suffix tree and inverted index.This paper has proposed SA-PL indexing model which combined suffix array with posting list. It has taken advantage of features that the suffix array supports rank query, phrase query, self-index and boundary uncertain language and has smaller storage space than suffix tree. The model aims to provide performance optimization of time and space compression under the premise of the same function of ST-PL and CII indexing model.The SA-PL-0indexing model designed according to the SA-PL indexing model.On the basis of the SA-PL-0indexing model, the SA-PL-1indexing model which reduced the index space by remove the short posting lists has been proposed.SA-PL-0, SA-PL-1, ST-PL and CII indexing model have been implemented. The experiment shows the SA-PL-0and the SA-PL-1indexing model could provide rank query, phrase query, self-index and supported boundary uncertain language well. They have advantages in time and spance over the ST-PL and CII indexing model.The SA-PL-1indexing model outperforms the other models.
Keywords/Search Tags:Full-text Indexing Model, Phrase Query, Rank Query, Self-index, Inverted Index, Suffix Array
PDF Full Text Request
Related items