Font Size: a A A

Design And Implementation Of Vertical Search Engine Based On Hadoop

Posted on:2017-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:L ChengFull Text:PDF
GTID:2348330485450611Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technologies,the information in the network are updating and increasing in an extremely high speed.As the result,a wide variety of search engines have been designed in this decade.When dealing with the massive data,the search engines in centralized style are not able to support massive data searching because of overload in servers,unstable system performance,low efficiency and etc.On the other hand,the widely used general search engines are not able to provide professional and accurate searching results due to the retrieval range is too wide.For these reasons,there is a gap between the search engine technology and the demands of information retrieval in specific fields.In order to fill in this gap,a vertical search engine system based on Hadoop is proposed.A Hadoop cloud computing platform is set up to store the files in distributed way and process the data in parallel style.With the support of MapReduce programming model,the system realizes all the functional modules of a search engine in distributed clustering environment,which can process data in high efficiency as well as guarantee the safety storage of data and stable operation of the whole system.Besides,a subject-oriented web crawler algorithm VPCRAW is designed to grab web information with regard to specified topics.The algorithm keeps the advantages of VSM and PageRank,gives consideration to both content relevance and link authority,and provides more professional source files for the subsequent modules.The accuracy rate of information retrieval is hence improved.The experimental results shows that,in comparison with the traditional centralized search engines,the vertical search engine system based on Hadoop can improve the searching efficiency remarkably when dealing with massive data.Compared with general search engines,our system can retrieve the information in higher precision and authority.Moreover,users can grab different web information according to different retrieval requirements by adjusting the damping coefficient p in VPCRAW algorithm.
Keywords/Search Tags:Hadoop, Vertical Search Engine, Web Crawler Algorithm, MapReduce, Damping Coefficient
PDF Full Text Request
Related items