The Design And Implementation Of Deep Web Crawler System Based On Template Configuration

Posted on:2022-04-21

Degree:Master

Type:Thesis

Country:China

Candidate:D J Kong

Full Text:PDF

GTID:2518306725976959

Subject:software engineering direction

Abstract/Summary:

PDF Full Text Request

Recently,Web Crawler has been widely used in web service,such as search engine,personal recommendation,and so on,since these services need to be supported by data extraction and parsing.Hidden database refers to the data set that organizations access on the network by allowing users to query through the search interface.In other words,getting data from such a source is not through static hyperlinks.On the contrary,the data is obtained through the query interface and reading the dynamically generated result page.This,together with other obstacles(for example,the interface may only partially answer queries),prevents hidden databases from being effectively crawled by existing search engines.With the emergence of dynamic web page technology,traditional information extraction methods based on static pages cannot meet the business requirements any longer.On the one hand,dynamic web pages convey much more bigger database generated data than traditional static pages loaded.Meanwhile,these contents in dynamic web pages usually contain certain topics and valuable information.On the other hand,traditional crawler methods,such as seed queue-based,depth-first traversal and breadth-first traversal,cannot effectively obtain dynamic pages' information(also called Deep web).Building crawler for deep web is worthy for not only business but also research.In this thesis,a deep web Crawler system based on template configuration is proposed to address the problems mentioned above.The Crawler system extracts the hidden data via sending keywords to target database in web forum form.The system flow includes five steps: first,locating the entrance of deep web database;then,interacting with the search interface automatically;third,evaluating deep web database's attribution;next,selecting keywords;finally,obtaining the crawling results.To implement the above steps,this thesis do research on the design and implementation of deep web crawler system.The system mainly includes five modules: parameter configuration,data crawler,data retrieval,data storage and data analysis.The system has been successfully on-line and running stably.It can effectively crawling most of information from databases.The design and implementation of this system also provides the design ideas and implementation guidance for other research and business.

Keywords/Search Tags:

Template Configuration, Deep Web, Crawler

PDF Full Text Request

Related items

1	The Design And Development Of Deep-Customizable Crawler Tool System
2	Study And Realization Of Template-based Web Crawler And Editing System
3	Design And Implementation Of Course Websites’ Generation Template Based On XML Configuration
4	Design And Implementation Of A Web Crawler Based On Deep Web Deep Data Acquisition
5	Research On Deep Web Data Acquisition Method
6	Research On Domain-oriented Deep Web Information Extraction
7	Some Technologies And Their Applications Research For Recursive Modeling And Recipe Reconstruction Of Configuration Product
8	Research On Topic Focused Web Crawler And Related Technologies
9	Design And Implementation Of Building Materials Information Oriented Web Crawler System
10	The Design And Implementation Of Mobile Middleware Rendering Engine System