Font Size: a A A

The Design And Implementation Of Topic Web Crawler About Mining Equipment

Posted on:2014-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y X HuangFull Text:PDF
GTID:2298330467466515Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the rapid development of society and Internet technology, the way people getinformation gradually has changed from the traditional way to the Internet search engine.In the face of the vast information of Internet, people begin to pay more attention to thetheme search engine which can rapidly and accurately get the effective information. Thetheme search engine is mainly about specific field, and topic web crawler is the mostimportant part of the theme search engine. Topic crawler’s quality will affect the qualityof the searching results, a good topic crawler can get the effective information in theInternet rapidly and accurately. The paper regards the topic crawler as the object ofstudy, analyzes and researches the topic crawler. The purpose is to establish a topiccrawler system in the field of mining equipment.Firstly, the paper introduced the principle and development of search engine, andmade a research and analysis of the main technology of the topic web crawler. Theresearch and analysis regarded the working process of the web crawler as the main lineof study. Secondly, the paper made a research about web denoising and web simplifying,analyzed and realized the information extraction of the main page. Finally, the papersummarized the advantage and disadvantage of three methods of word segmentation.The method of calculating the similarity of the text mainly introduced the algorithm ofthe vector space model and the PageRank, and the calculating of the vector space modelincludes weight calculation and characteristic selection.The paper described the realization process of the topic crawler system in the fieldof mining equipment. It mainly included the analysis and research the theoreticalknowledge of the topic crawler, the design of the process and structure of the topiccrawler system, choosing initial URL by the requirement, designing the database and soon. The algorithm of calculating the system’s correlation used vector space modal, andit made the system become more accurate. The paper also shows the correspondinginterface, and introduces the detailed implementation of the system.
Keywords/Search Tags:topic web crawler, vector space model, valuation of URL, topic correlation, mining equipment
PDF Full Text Request
Related items