Design And Implementation Of Vertical Search Engine Based On Hadoop

Posted on:2017-07-13

Degree:Master

Type:Thesis

Country:China

Candidate:L Cheng

Full Text:PDF

GTID:2348330485450611

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of internet technologies,the information in the network are updating and increasing in an extremely high speed.As the result,a wide variety of search engines have been designed in this decade.When dealing with the massive data,the search engines in centralized style are not able to support massive data searching because of overload in servers,unstable system performance,low efficiency and etc.On the other hand,the widely used general search engines are not able to provide professional and accurate searching results due to the retrieval range is too wide.For these reasons,there is a gap between the search engine technology and the demands of information retrieval in specific fields.In order to fill in this gap,a vertical search engine system based on Hadoop is proposed.A Hadoop cloud computing platform is set up to store the files in distributed way and process the data in parallel style.With the support of MapReduce programming model,the system realizes all the functional modules of a search engine in distributed clustering environment,which can process data in high efficiency as well as guarantee the safety storage of data and stable operation of the whole system.Besides,a subject-oriented web crawler algorithm VPCRAW is designed to grab web information with regard to specified topics.The algorithm keeps the advantages of VSM and PageRank,gives consideration to both content relevance and link authority,and provides more professional source files for the subsequent modules.The accuracy rate of information retrieval is hence improved.The experimental results shows that,in comparison with the traditional centralized search engines,the vertical search engine system based on Hadoop can improve the searching efficiency remarkably when dealing with massive data.Compared with general search engines,our system can retrieve the information in higher precision and authority.Moreover,users can grab different web information according to different retrieval requirements by adjusting the damping coefficient p in VPCRAW algorithm.

Keywords/Search Tags:

Hadoop, Vertical Search Engine, Web Crawler Algorithm, MapReduce, Damping Coefficient

PDF Full Text Request

Related items

1	Design And Implementation Of Vertical Search Engine Based On Web Crawler
2	Research On Focused Crawler Technology Of Vertical Search Engine
3	Research And Design Of Vertical Search Engine Web Crawler
4	Research On An Algorithm Of Focused Crawler In Vertical Search Engine
5	The Research On Focused Crawling Algorithm In Vertical Search Engine
6	The Design And Implementation Of Vertical Search Engine For Resold House
7	Research And Implementation Of Tax Vertical Search Engine And Improved PageRank Algorithm
8	Research And Implementation Of Web Crawler On Vertical Search Engine
9	Research And Application Of Focusing Crawler Which Faced Vertical Search Engine
10	Design And Implementation Of A Vertical Search In The Life Service Industry