Font Size: a A A

Based On Templated Web Crawler Technology Of Web Page Information Extraction

Posted on:2013-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:F QiaoFull Text:PDF
GTID:2248330374986430Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Web information extraction with template-based web crawler technology is arelatively new topic, It is different from previous web crawler technology. Most of thetraditional web crawler is a dedicated web crawler, it for different sites and differentchannels within the site need special preparation for each channel. This traditional Webcrawler, with the growing online content, and with continuously updated Web content,It will give Web crawler programmer complexity of the workload, and take it easy toprogram a wrong web crawler, and operation would bring difficulties to the user. In thiscontext, Web information extraction with template-based web crawler technologyemerged, it requires the programmer writes a site template in the configuration file (inthe database),when program called, according to some fixed features of the site, matchthe template library, find the matching template, and then accurate and efficientoperating procedures. Web information extraction with template-based web crawlertechnology, will not only simplify the work of programmers but also simplify the user`sactions.According to Web feature, Web information extraction with template-based webcrawler technology read from the configuration file stored in the database templatelibrary, automatic template matching based “website_url”, make the program runefficiently.This major work includes:1、Propose the web information extraction with template-based web crawlertechnology, and work on the technology process in detail.2、Analysis the web page of the reference site, introduced the concept and use ofregular expressions, and summed up in38pages website templates.3、Gives the basic structure and operation mode of web information extraction withtemplate-based web crawler technology, and make a detailed description of the crawlingprocess.4、Designed and implemented web information extraction with template-based web crawler technology, and make the system run the test.
Keywords/Search Tags:Web crawler, Template, Information Extraction, Web
PDF Full Text Request
Related items