Font Size: a A A

The Design And Implement On Incremental Web Crawler For E-commerce Web Sites

Posted on:2011-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:S YangFull Text:PDF
GTID:2248330371963394Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the Internet and electronic commerce rapid development, E-commerce sites more and more. In order to find and compare the goods as soon as possible, People often use through the shopping search engine to search and comparison of commodity information. Search engine from the comparison shopping website development and shopping, can help consumers in the Internet search for required on the precise products, compare several e-commerce services and commodity prices. E-commerce sites is always changing, such as commodity prices and add, delete the adjustment. In the traditional comparative shopping website, commodity information lags behind, can not really compare.To solve the above problem, in the crawler must join incremental scraping technologies, reptiles crawl directly affects the quality of information search results, the incremental crawler technology application level is the key of the success of search engines. The incremental crawler’s characteristic is maintains the change path and forecasts of page, and predict the change time of page, provides the inspection URL tabulation. A good incremental crawler can reduce manual intervention, and improve search engine’s trendy, precision and recall, and improve network bandwidth utilization.In this paper, Proposed URL-based classification crawler strategy, divided URL into Index type, Channel type, List type, Content type. For different types, use different crawler methods. designed E-commerce oriented crawling model. key crawling algorithm is described. Based on the open-source crawler Heritrix, incremental crawl functionality implemented, the extraction of E-commerce website product page, and incremental capture function model. Through the crawling experimental of e-commerce sites, indicate that the incremental crawl strategy designed to effectively extract the information goods e-commerce site, and implement incremental crawler.
Keywords/Search Tags:E-commerce, Search Engine, Incremental Crawler, Heritrix
PDF Full Text Request
Related items