Font Size: a A A

Research On Distributed Full-Text Index System

Posted on:2011-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:L SunFull Text:PDF
GTID:2178330338979959Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Distributed information retrieval technology is an important technology in modern information processing, which is widely used in search engine, competitive intelligence system, public opinion monitoring system and other areas. Research on Distributed full-text index not only has high science value but also great business outlook., Key techniques in distributed indexing system, include creating and updating indexes, index partitioning, load balancing in index distribution, distributed index querying and so on. In this paper, we use existed index creating technique, which is mature already. This paper will focus on index partitioning, as index allocation strategy is the foundation of the distribution and may affect index creating and updating, as well as querying.The performance of indexing system is a key factor in query performance of web search engine. Indexing system in traditional Web Search Engine generally run on large-scale high performance cluster that is expensive. If a Distributed Indexing System, which can run on several small clusters interconnected by the Internet, is designed, hardware costs will be reduced. There are two well studied index partitioning scheme currently. One is terms partitioning and the other is documents partitioning. Both of them have their advantages and disadvantages. Synthesizing the advantages and disadvantages of them, considering the applied net conditions, a layered index-partitioning scheme is proposed in this paper. Documents partitioning scheme is applied among the clusters; Terms partitioning scheme is applied in cluster. An additional update indexing server is involved for the updating of new documents. Experimental results indicate that layered Indexing System providing a higher throughput while costing fewer resources as well as a good load balancing level.To achieve better results for public opinion monitoring, information extraction is done before indexing form news, blog and bbs pages that public opinion monitoring system concerns. Only information which public opinion monitoring system concern is extracted and indexed. Thus the retrieval Accuracy and index creating efficiency is improved.
Keywords/Search Tags:search engine, full-text index, distributed index, information extraction, documents partitioning, terms partitioning
PDF Full Text Request
Related items