Font Size: a A A

Research On Key Technology Of Distributed Full-Text Index For Web Information

Posted on:2007-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:L L ZhangFull Text:PDF
GTID:2178360212967034Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the continuously extending of computer application field, the number of data become more and more large, and the search operation become more and more complicated. The distributed index gradually become the valid means of resolving this complicated problem because of its high performance. Due to a terabyte of web document, we put forward more requires on indexing building, indexing updating and assigning data of distributed index. The research about them has been hot topics, but they are also the difficult ones. In this paper, we focus on reducing space and time of building inverted index, speeding up indexing updating, and accelerating the query proceeding task.Firstly, we propose a new algorithm based on document predisposition for building inverted index. It has higher spatial and time efficiency than the tradition ones. The algorithm is based on the idea of computing the size of inverted files first,so we can avoid disk sort on building index. This can improved the efficiency of resource utility and reduced the index construction time.Secondly, now the web documents are updated frequently, so we should update the index as well in order to consist with the documents. In this paper, we propose a update incrementally strategy of block-based inverted index. It needn't to move index files while updating the index file, and it has little influence on the process of retrieval. The method effectively improved the time of search and updating and provided insert and delete operations.Thirdly, Global inverted file has disadvantages on retrieval speed and scalability. A new strategy was designed and implemented to overcome these disadvantages. The new strategy, local inverted file, can provide fast and effective distributed information retrieval in large-scale information retrieval systems.Finally, we implement a competition intelligence system based on distributed full-text index. This system which supports information collecting, storing, indexing and retrieval can provide intelligent information to users fast, correctly and in time.
Keywords/Search Tags:full-text index, document predisposition, build index, incremental updating, distributed index
PDF Full Text Request
Related items