Font Size: a A A

A Distributed Search Engine Of E-Business Topic

Posted on:2017-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:S X ZhuFull Text:PDF
GTID:2308330503468522Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The data scale of Internet information is increasing currently.And the Internet has come into big data era.It is becoming increasingly difficult for people to find the information they need in such a vast amount of data on the Internet.Nowadays,people tend to use search engines to search for information, and they tend to search information in a particular fieldin most cases.At present, most of the search engines on the Internet are general search engines, the result ofthe search is not clear,and the theme of the result is extensive.It make the search become powerless,because users always search information in a specific theme.Theinternet information data is growing faster and faster,and so that, themed, intelligent, personalized search engine has slowly become the development direction.Especially,search engine which based on a specific topic or themeis one of the moset hot research topic. Current consumers searching for electricity product information always depends on the electricity business portal,because they can not search the product information on search engine.The research for search engine which base on E-business theme,and provide product information search sevice is not enough yeah.Base on the problem of the general search engine can not provide search sevices to E-business product.I design and developed a search engine based on E-business topic, is convenient for the user to query to information related to commoditiesquickly and accurately. In this paper, my research base on the topic of the electricity business search, combining with the practical application needs, studying distributed theme search engine related technical and principles.Firstly,i study and analysis the technical principles of search engine in depth.Such as web crawler,data indexing,chinese word segmenttation,web page classification,these technologise is the basically for topic search engine.Throughing studying and reseach for these technologise to topic search engind,i knowed how the topic search engine work.Then,i study the distributed computing framework Map Reduce And distributed file system Hadoop file system(HDFS).After that,i introduce a design scheme base on MapReduce programing model.Completed the system architecture design, system function division, web crawler process, index process and the analysis and design of the search process in the Hadoop platform..In the process of system implementation, the reference Nutch is used as the basic frame of the web crawler, and the Solr is used as the search framework, and the Chinese word segmentation tool(IK-Analyzer) is introduced to deal with the Chinese content.Then,i proposed some solutions to E-Business web crawler frequently encountered problems.Finally,i developed and deployed a distributed search engine system with 4 nodes.Test and evaluate with JingDong and Tmall Data.
Keywords/Search Tags:E-business, theme, search engine, Nutch, Solr, Hadoop, MapReduce
PDF Full Text Request
Related items