Font Size: a A A

Research On Vertical Search Engine Of Bidding Information Based On Hadoop

Posted on:2017-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2348330512987463Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Bidding information is a part of the Internet information,the information plays an important role in the enterprise sales staff bidding decision-making,fast,effective and accurate to obtain this kind of information can improve their work efficiency.Using general search engines to obtain bidding information will generate return results more and less effective information,return information is not comprehensive,the presence of ambiguities and other shortcomings,vertical search engine is a professional search solution,so the vertical search engine based on the field of bidding information has become an important demand on enterprise sales staff.The key problem of massive data processing in search engine is the storage and efficient computation.The traditional centralized architecture or distributed architecture can solve the problem of storage and efficient computationbut its high economic cost is not acceptable to any enterprise.There are cluster cost low cost,open source platform,powerful data storage system HDFS and efficient distributed programming model Map Reduce and other advantages with the distributed computing platform on Hadoop,so applying the Hadoop in vertical search engine has a very important research value.This paper mainly does the following work in the process of realizing the whole search system:(1)The business requirements and functional requirements of the system are analyzed,and the overall function of the system is proposed,and the overall structure of the system based on Hadoop is proposed.The composition framework of each sub module is also put forward for the web crawler,index device,search device and user interface in the composition structure of the search engine.(2)Research on the tender information topic crawler implementation based on Hadoop.The basic method is proposed to build topic thesaurus and on this basis to achieve the topic model implementation based on the bidding information thesaurus dictionary,combined with Nutch open source framework and the construction of the thesaurus dictionary topic model completed the entire reptile process analysis and implementation process.(3)Completed the web page parsing,Chinese word segmentation,based on the Hadoop distributed indexing device and retrieval device,user interface.In the aspect of distributed index device,the final goal and the implementation process of the index device are analyzed and achieved,and the basic process and the implementation process of the retrieval are analyzed and achieved.(4)The development and testing of the prototype of the search system is realized.For the topic model based on the thesaurus dictionary were tested,the experimental result shows that the model can achieve higher catch full rate and accurate rate.The scalability of web crawler based on Hadoop is tested,the result shows that the crawling speed can be improved significantly with the increase of the number of nodes.
Keywords/Search Tags:vertical search engine, bidding information, topic model, Hadoop
PDF Full Text Request
Related items