Font Size: a A A

The Research On Data Extraction Mechanism In Deep Web Based On Result Pattern

Posted on:2009-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:P QiFull Text:PDF
GTID:2178360308479252Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and popularization of Internet, the rich Web resources constitute a huge global information warehouse and network has been one of the main measures to obtain interested information. Faced with huge Web information, it has become increasingly difficult for users to find the information quickly and accurately.According to the above issues, there have appeared a lot of automatic and semi-automatic Deep Web data extraction systems. This paper designs Deep Web information integration system DWIIS. The system can be divided into Deep Web access interface, query interface integration, query decomposition, result record acquisition, integration of result record, the query result show, which is used for information integration and restructuring and based on the information to do value-added services.This thesis details the information access mechanism in the Deep Web. The crucial and bottleneck problem of the improvement of efficiency and precision of data extraction in Deep Web is repeat label assigning, generation of extracting pattern and the existing of nested attributes. The paper proposes a mechanism of data extraction in deep web based on result pattern. After the construction of feature matrix of web page data we first generate the set of attributes and extracting symbols of attributes by the analysis of feature matrix of web page data both of which are the components of the result pattern. The set of attributes can be used to identify entity recognition and combine the result. The symbols of attributes are used to extract data from the pages of the same kind. Secondly, it's time to extract data from the pages of the same kind according to the result pattern. In this way, data set with label assigned can be obtained. Experimental results based on this method confirm the high efficiency and precision.
Keywords/Search Tags:deep web, feature matrix of web page data, result pattern, data extraction, entity recognition
PDF Full Text Request
Related items