Font Size: a A A

Research On Intelligent Web Advertising Crawler System

Posted on:2014-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:D LiFull Text:PDF
GTID:2298330422990432Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, the Internet has great influence on people’s daily lives, and ithas evolved into a very important advertising media together with television andnewspaper. Because of its wide coverage, rich interactivity and some othercharacteristics, web advertising has attracted a large number of advertisers to runadvertising for marketing on the Internet. The ads data on the Internet are very rich,it is meaningful to collect these web advertising data, but right now, there is nocollector for these.We want to design a crawler system for web advertisement; this system is usedfor collecting Internet advertising data. We mainly do the following threeresearches:(1) Design the crawling strategy for advertising data. Through calculating theweight of the URL seeds, crawling strategy arrange crawling order of URL seedsaccording to the weight of them. Combined the web advertising types that the adcrawler system crawl and the method of web ad delivery, we propose thedownloaded page’s weight calculation method and seed’s weight calculation method.Based on the downloaded page weight and some global statistical knowledge, wecalculate seed’s weight;(2) By observing and analyzing a large variety of different type web pages, wedesign the web advertising information extraction method to extract ads from webpages. Based on the locality and aggregation of ads in web pages, this method useclustering algorithm as page segmentation to cluster all hyperlinks in web pages intohyperlinks block, and then use heuristic rules to determine the class of hyperlinkblock, if it is advertising block, extract ads from it;(3) Based on the previous researches, we design and implement an intelligentweb advertising crawler system, the system start with default URL seeds, andautomatically download web pages, then extract ads from these pages. Theexperiments show that the crawling strategy of intelligent web advertising crawlersystem is more efficiency compared with breadth-first and depth-first strategy. Onthe other hand, the extraction algorithm can extract ads accurately.
Keywords/Search Tags:web advertisement, crawling strategy, information extraction, pagesegmentation, clustering
PDF Full Text Request
Related items