Font Size: a A A

Understanding Of Web-based Document Inverted Row Of Full-text Index Research And Realization

Posted on:2011-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2208360302970042Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As we know, the huge amount of Internet information promoted search engine to the popularization and application, thus also contributed to the search engine technology to flourish. For today's search engine technology is going to be mature, the concern of academics on the search engine technology itself is gradually transferred to the professionalization of the search results. The emergence of Topic Search Engine improved search recall ratio and precision ratio. As the core technology of search engine, indexing is bound to keep up with the pace of progress, continually innovating.Under the backdrop of Topic Search Engine's all-round development, this paper's goal is to build the inverted index system based on Web document comprehending, in order to compensate for the traditional inverted index system lack of association index with the related words. This paper put emphasis on the reacher of inverted indexing and Web document comprehending technology. we combined those two technologies, in aspects of index creation, update and maintenance, to make the index system fit for the Topic Search Engine.This paper put weight on the following aspects that the author implemented:(1) This paper elaborated the theory and the method of Web document comprehending. We analyzed several important Web document comprehending method such as PageRank, Chinese automatic segmentation, vector space model as well as latent semantic analysis.(2) This paper proposed an improvement of inverted index file structure, related inverted index. Experiments show that the retrieval system using the related inverted index could reduce search time-consuming and increase retrieval efficiency.(3) This paper improved the query similarity calculation formula. Experiments show that the similarity values of web page with seaarch query calculated by the improved formula were more accurate.(4) This paper analyzed all the function of inverted index system carefully and gave the algorithms about creating inverted index and searching index.(5) This paper designed and built the inverted index system based on Web document comprehending. The inverted index system implemented the function of creating, adding, deleting inverted index as well as related retrieving.
Keywords/Search Tags:Inverted index, Web document comprehending, Latent semantic analysis, Index dynamic update, Correlation
PDF Full Text Request
Related items