Research On The Deep Web Search Engines Based On The Thematic Domain

Posted on:2009-10-16

Degree:Master

Type:Thesis

Country:China

Candidate:F Wang

Full Text:PDF

GTID:2178360272483389

Subject:Computer application technology

Abstract/Summary:

As the rapid development of Internet,there is tremendous information can be supplied for us.But,current traditional search engines retrieve only a small portion of World Wide Web.In particular,they ignore the tremendous amount information hidden behind search forms,in large searchable electronic databases.In order to achieve those information,we must submit automatically forms and extract automatically correlative information from feedback web page and save them to local database in a uniform pattern to convenience user's searching.The paper firstly introduce the general principle of general Search Engine,then give the summary of Deep Web.Analyze the survey of Deep Web quantificationally. After that,propose the key modules of Deep Web Search Engine based on the thematic domain,including form extracting,query disposing and results extracting. The paper main research work includes:1. Discuss the definition of Deep Web,and analyze the principle of Deep Web Search Engine,then propose the thoughtway of designing as a whole.2. Research methods of discovering Deep Web sites.and a method of query interface expression is proposed.3. Discuss the algorithm about query interface extracting in current.Considering the comparability of the same theme,the paper proposes the idiographic extracting process to query interface elements by competitive classified method,the result is quite ideal by proved.4. Select the correlative datasource by calculating the comparability of elements.and arrangement datasources and select the higher correlation degree datesource as the input element of query converter.5. Adopt the query converter to resolve the mapping problem between a user query and a set of Deep Web source query interfaces.6. Give a survey about the technique of extracting web page layouts.Then by designing the regular expression and stylebook templet,distill the semi-structured page layouts, And saving the result information in the local database.In order to test validity of algorithm refered,this paper design a Deep Web search engines based on query about position,list 51job web site and chinahr web site,and carry through extraction test,feedback the results to user in uniform model.

Keywords/Search Tags:

Deep Web Crawler, Form Exracting, DataSource Selecting, Query Results Exracting

Related items

1	The Key Technology Research On Deep Web Information Integration System
2	The Research On Deep Web Interfaces Integration And Query Results Ranking
3	Design And Implementation Of A Web Crawler Based On Deep Web Deep Data Acquisition
4	Research And Implementation Of JDBC And COM Datasource Integration
5	Research On Deep Web Data Acquisition Method
6	The Research And Implementation Of Deep Web Query Results Extraction
7	Research On Source Discovery And Query Results Extraction Of Deep Web
8	The Key Techniques Of Deep Web Search Engine
9	Study On Deep Web Query Interface Pattern Matching And Query Results Annotation
10	The Research And Implementation Of Deep Web Query Result Integration Processing