Font Size: a A A

Research On Deep Web Data Acquisition Method

Posted on:2011-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:X B CaiFull Text:PDF
GTID:2178360305476536Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, Web information scale is growing continuously, which provide people with all kinds of available information. Large amount of information is stored in the Web database, which can only be accessed through the web query interface. Changed the way of visiting web page by link, so the traditional search engines can not access, they are called Deep Web. The increasing of Deep Web information with high-speed have being a significant resource for information retrieval. Due to the heterogeneity and dynamicity of Deep Web data, data integration of large-scale Deep Web are very challenging. By crawling Deep Web data, integrating web database in local host is becoming more and more significant.This thesis researches on Deep Web data acquisition in-depth, and propose the related algorithms and models. Our research issues are follows:(1) Research on characteristic in Deep Web site and query interfaces. In deciding which form inputs to be filled when submitting queries to a form, propose a method for searching valid attribute compounding based on attributes correlation(2) Analyze characteristic of attributes in query intefaces. a method to identify each typed text attribute in query interface by machine learning methods is proposed.(3) By the classification of attributes, For different types of attributes, used different methods to find appropriate query words. For generic text attributes, extracting the corresponding content in query result page, and through adaptive strategy to select the appropriate keywords as the query words . For typed text attributes,used the knowledge base built by hand.(4) Analyze the pages of the Deep Web website update features, by the Poisson model to model web pages update events, incremental crawling the Deep Web data. And designed the system frame of crawler to incremental crawling the Deep Web data.
Keywords/Search Tags:Deep Web Crawler, Attributes Correlation, Attribute Compounding, Query Selection, Incremental Crawler
PDF Full Text Request
Related items