Font Size: a A A

Key Technology Study On The Cloud Computing Platform In The Field Of Search Engine

Posted on:2012-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:M Y JiangFull Text:PDF
GTID:2178330335478100Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the Internet, news, blog, forums, micro-blog and other types web sites continue to emerge, and the amount of data carried by the Internet is continue growing. Using search engines to find information has become a common way. But with the growing volume of data, the traditional centralized search engine solution can no longer effectively provide fast and efficient retrieval service. In this paper, under the massive data environments, we explored the key technologies of using cloud computing platform to process data, creating indexes and providing distributed search services.This paper points out the necessity and feasibility of using cloud computing platform to develop search engine, and then it discussed the Hadoop-based cloud computing platform, the Lucene-based search engine, the RMI-based distributed systems and the Memcached-based cache system. On this basis, we designed distributed search engine which consist of the distributed index module, the distributed search module and cache modules.In the distributed indexing module, data was put on the HDFS distributed file system in the form of SequenceFile after processing, and then it will be used creating indexes in the way of MapReduce under the strategy of index-decomposition and index-block-division. In the distributed search module, we constructed a central node - search nodes based distributed search engine framework with the RMI and Lucene technology. Search nodes provides full-text index search service with Lucene technology. The central node responsible for receiving the user's search requests and distribute them to the search node, after search actions were completely performed by the search node, the central node will then merging the search results, and finally sent them to the user. In order to avoid high concurrent problems caused by high load, we use the Memcached system to cache search results, and finally it effectively reduced the search engine's load.50 nodes in the cloud computing platform formed the experiment, and the results show that the method proposed in the paper can efficiently index data, and can provide efficient and accurate full-text search service.
Keywords/Search Tags:Cloud computing platform, Hadoop, MapReduce, RMI, Lucene, Distributed search engine
PDF Full Text Request
Related items