Font Size: a A A

The Research Of Word Index Method Based On Inter-Relevant Successive Trees Model

Posted on:2011-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:J T ChenFull Text:PDF
GTID:2178360308990390Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, people increasingly want to find the information they want from web pages quickly and accurately, thus full-text retrieval technology emerged. The technology of building index and providing retrieval for the full text of the document has gradually become the mainstream of Web Information Retrieval. Index construction strategies and indexing model are the core of full-text retrieval technology. How to combine them for improving the performance of full-text retrieval system is of great significance.The knowledge system of full-text retrieval technology is introduced. Then the existing index construction strategies are studied and compared, by which the index construction strategy based on vocabulary as the content of research is choosen. Chinese Automatic Word Segmentation technology in Word-based Index method is deeply studied. Aim to solve the difficulty of updating the dictionary and slow searching speed, an improved PATRICIA tree dictionary structure is proposed. It can easily add new entries, and use forward minimum matching method to cut words, which improves the efficiency of segmentation.The popular index models and their advantages and disadvantages are analyzed. Among the existing index models, the IRST (Inter-relevant Successive Tree) model has a faster speed to create and query, query forms, etc., so the structure and algorithms of the model are deeply studied.The present study of this model mostly uses the word indexing method, which has low retrieval accuracy and relatively higher inflation problems.So we applicate the word-based strategy of building database with the full-text search system which uses IRST model as its index model.We build the IRST model after the full text has been cut into vocabularys, which has high precision and can reduce the expansion ratio of the index. Then the tree structure of word dictionary and IRST index files are associated, when segment the query string in the retrieval process, index files can be looked up directly,which greatly improves the retrieval efficiency.Finally, the new method of index is verified and analyzed by experiment. The experimental result shows that this method can improve the precision and effectively reduce the expansion ratio of index.
Keywords/Search Tags:Full-text Retrieval, Inter-relevant Successive Trees Model, Words-based Index, Chinese Word Segmentation
PDF Full Text Request
Related items