Research And Practice On Key Techniques Of Deep Web Crawl

Posted on:2011-04-05

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Feng

Full Text:PDF

GTID:2178360302974615

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

As the Internet continues to develop, the amount of information in the network grew rapidly. Based on the difference of accessing information, internet can be divided into shallow network and the deep Web. We crawled shallow network by means of hyperlink which can be carried out on a common search engine. While deep web information hidden behind the web search box, the user must submit queries in the web form to obtain the information.As the method of crawling in deep web clearly distinguished from shallow network, which means traditional hyperlink-based web crawler can not crawl and index deep Web information. With the growth of useful information in the deep web, accessing into the deep web is of significance mean for the search engines.We designed a deep web crawler base on the most efficient queries. Our method solved the problems of low level automatic and domain constrain in the deep web crawling. Our deep web crawler contained three core algorithms. Reorganizations of entry of deep web, training algorithms through a large number of HTML page's form control, text context and depth in the site. Calculation of the most efficient initial queries, dividing the form page into two spaces which are form and text context space, then performed K-Means algorithm up crawled pages to get the candidate queries. Submitting the most efficient queries to site, analyzed the result pages to update candidate queries iteratively.Finally, we designed system and tested all the algorithms. We completed the system coding work based on the theoretical analyzing. Experiment results verified the effectiveness of the algorithm.

Keywords/Search Tags:

Deep Web, Deep Web Crawler, Page Cluster, Most Efficient Queries

PDF Full Text Request

Related items

1	Design And Implementation Of A Web Crawler Based On Deep Web Deep Data Acquisition
2	Research On Method Of Deep Web Oriented Based On Web-Page Blocking
3	The Research On Deep Web Data Integration Of Forestry Enterprise Yellow Page
4	Research On Deep Web Data Acquisition Method
5	Research And Analysis Of Efficient Web Page Classification Technology Based On Deep Learning
6	Design And Implementation Of An Ajax Supported Deep Web Crawler System
7	Design And Implementation Of Distributed Crawler System Based On Docker Cluster
8	Research On An Ajax Supported Deep Web Crawler Model
9	Study On Schema Recognition Oriented To Response Page Of Deep Web
10	Research On Focused Crawler Based On Page Segmentation