Font Size: a A A

The Research And Design Of Search Engine Based On Distribution

Posted on:2016-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y G ZhangFull Text:PDF
GTID:2348330542975738Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and Internet technology,information on the network grows explosively which brings great challenge to traditional search engine technology.Faced with massive data processing and storage,people demands for search engines not only being able to obtain accurate search results,but also possessing a better timeliness,higher expansibility,lower failure rate.In the era of big data,there is increasingly high demand for capacities of processing data.With the development of distributed computing technology as well as the growing popularity of cloud computing,the distributed search engine will undoubtedly become the trend of future development.In this thesis,based on a distributed search engine,it discusses the current researches both at home and abroad on the search engines as well as on future development trends of search engines;analyzes how search engines work;elaborates on the related theories of distributed search engine technologies.Comparison and analysis among several of the major solutions of distributed search engines technical,study its improvement measures,propose a solution of a distributed search engine based on Hadoop.According to the characteristics of the search engine,the search engine is divided into three sub-modules: crawler,indexer,searcher,each sub-module of the system is designed and implemented in detail,the Map/Reduce model and HDFS distributed file system of Hadoop are applied in the entire search engine.Based on the original Page Rank algorithm to optimize it,add user-access feedback impact factor to it,PageRank algorithm based on the feedback of user-access is proposed for web page scoring.Building search sub-module by lightweight web framework SpringMVC,the view layer using a new generation of Java template engine jetbrick-template instead of Jsp,it will speed up page loads and improve the timeliness of the search engine.Finally,it introduces and deploys the experiment environment,setting up a distributed search engine system.Search engines system was tested from the aspects of functionality,reliability,scalability.The experimental results were compared and analyzed,demonstrates the feasibility of a distributed search engine based on Hadoop solution.
Keywords/Search Tags:Distributed Computing, Search Engine, Crawl, Index, Search, Hadoop, PageRank
PDF Full Text Request
Related items