Font Size: a A A

Research On The Deep Web Search Engines Based On The Thematic Domain

Posted on:2009-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2178360272483389Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the rapid development of Internet,there is tremendous information can be supplied for us.But,current traditional search engines retrieve only a small portion of World Wide Web.In particular,they ignore the tremendous amount information hidden behind search forms,in large searchable electronic databases.In order to achieve those information,we must submit automatically forms and extract automatically correlative information from feedback web page and save them to local database in a uniform pattern to convenience user's searching.The paper firstly introduce the general principle of general Search Engine,then give the summary of Deep Web.Analyze the survey of Deep Web quantificationally. After that,propose the key modules of Deep Web Search Engine based on the thematic domain,including form extracting,query disposing and results extracting. The paper main research work includes:1. Discuss the definition of Deep Web,and analyze the principle of Deep Web Search Engine,then propose the thoughtway of designing as a whole.2. Research methods of discovering Deep Web sites.and a method of query interface expression is proposed.3. Discuss the algorithm about query interface extracting in current.Considering the comparability of the same theme,the paper proposes the idiographic extracting process to query interface elements by competitive classified method,the result is quite ideal by proved.4. Select the correlative datasource by calculating the comparability of elements.and arrangement datasources and select the higher correlation degree datesource as the input element of query converter.5. Adopt the query converter to resolve the mapping problem between a user query and a set of Deep Web source query interfaces.6. Give a survey about the technique of extracting web page layouts.Then by designing the regular expression and stylebook templet,distill the semi-structured page layouts, And saving the result information in the local database.In order to test validity of algorithm refered,this paper design a Deep Web search engines based on query about position,list 51job web site and chinahr web site,and carry through extraction test,feedback the results to user in uniform model.
Keywords/Search Tags:Deep Web Crawler, Form Exracting, DataSource Selecting, Query Results Exracting
PDF Full Text Request
Related items