Font Size: a A A

The Research And Development Of Distributed Real-time Vertical Search Engine

Posted on:2012-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:W W FuFull Text:PDF
GTID:2248330395985066Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popularity and development of SNS gregarious network, online news and etc, people pay close attention to obtaining real-time data in short time. Therefore, nowadays real-time search has become a hot topic in the field of information search. Traditional search engines have a massive amount of information of Web page and pay more attention to comprehensiveness of information. As a result, it would take traditional search engines including Google several hours to update the indexes for new Websites, which can hardly meet the real-time requirements from costumers. Recently, the research of real-time search mainly focuses on Vertical search engine that is of special field and high centralization. Real-time Vertical search engine retrieve and get information quickly aiming at special field and data source, which lay a solid foundation for the development of data mining and have great research value and economic value.The difficulty of real-time search technique is constructing algorithm for incremental index and realizing data-tolerance in the distributed system. Therefore, firstly, this paper researched the fundamental principle of vertical search engine and distributed system, and then proposed memory-disk-based parallelization algorithm named RSearch for constructing indexes. The global index adopted by RSearch algorithm to write data to disk helps keep index-data consistent and integrated. Real-time incremental index writes data to memory directly, which keeps timeliness, and data would be copied to disk after ram has been filled afterwards disk index would be constructed. The real-time index’s disaster-tolerance has been guaranteed by adopting M*N distributed model to split massive amount of data and meet the demand of concurrent accesses, introducing into CommitLog log and persistent requests of real-time index and setting Checkpoint for rollback..On the basis of RSearch algorithm mentioned above and Solr, this paper constructed distributed real-time vertical search engine, RSolr system, and broke bottlenecks existed in Rsolr search system by improving the efficiency of query, sort and constructing indexes. Experimental results showed that comparing to Solr system, RSolr system has better performance on index construction, searching ability, real-time, data disaster-tolerance and distributed performance, which means RSearch algorithm is a real-time, stable, efficient and available algorithm.
Keywords/Search Tags:Information Retrieval, Vertical Search Engine, DistributedSystem, Data Disaster Tolerance, RSearch, RSolr, Real-time Search
PDF Full Text Request
Related items