Font Size: a A A

Design And Implementation Of Network Analysis Based On The Page Crawler

Posted on:2013-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z HaoFull Text:PDF
GTID:2248330392957247Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As time slowly backward continuation of the scientific and technological levelconstant innovation, more and more developed network communication, networkinformation, and transfer faster and faster, the sharp increase of the demand for dataon the Web information is also exponential increase slowly network informationfiltering seems particularly important to also continue to promote the web crawlertechnology, fast forward improvements, select the filter conditions over traditionalweb crawler with a wide range of information and the timeliness of the theme isdifficult to be protected, for how to improve the efficiency of the Web Crawlersearch and filtering of information very worthy of study.The main process for the purpose of this study is to improve the efficiency ofthe search results in the minimum time the user want the Web to collect information,including: Web crawling, Web filtering, web analytics, web localization, web pagecrawling need to be addressed efficient crawl through the process to the destinationpage, the page filter junk pages, page content filtering, page analysis to obtain thepage split combination process, the page localization client complete page isdisplayed.Achieve a Spider, one can automatically crawl the web, detailed explanation ofthe URL address resolution, de-emphasis, the page loads, page filtering, URLaddress resolution, to improve the performance of the program is running,expression validation, the search strategy were discussed. Page analysis of the htmltag parsing to extract simple and feasible method to extract web page text, URLlinks, js, css, script files, images, multimedia files. Web localization is how theclient to show the page the original page to save, and finally, an exampleexperiment.
Keywords/Search Tags:Web crawler, page analysis of, the structure of search, engines webapplication
PDF Full Text Request
Related items