Font Size: a A A

A Research On Key Technologies Of Deep Web Data Integration Based On Result Pattern

Posted on:2010-09-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:A X MaFull Text:PDF
GTID:1118360302977797Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The quantity and quality of information in Deep Web are much higher than Surface Web. Therefore, how to effectively obtain and integrate Deep Web information has already attracted much attention. At present, many researchers have designed some typical Deep Web data integration system framework, and been studying the techniques related to Deep Web data integrating such as Deep Web data sources finding, Deep Web data sources classifying, query interfaces integrating, Deep Web data sources selecting, query translating etc., and made lots of achievements.As a core function among Deep Web data integration system, Deep Web query result processing needs real-time extract, annotate and merge large amounts of result data. Therefore, the performance and effects of query result processing will directly influence Deep Web data integration system. The existing query result processing works to a certain extent, have realized automatic data extraction, semantic annotation and result merging, but there are still some problems such as semantic annotating performance, repeatedly semantic annotating, data heterogeneity and conflict processing, data extracting performance, repeatedly pattern matching etc. These problems lowered the performance and validity of Deep Web data integration.In order to realize Deep Web data integrating with the ability of efficient and accurate query result processing, this dissertation gives the definition of result pattern for Deep Web data source, and proposes the novel approach of Deep Web data integration based on result pattern. The dissertation does research from many aspects of query result processing, such as result pattern generating method, the classification of conflicts and result patterns conflicts detecting, Deep Web data extracting algorithm based on result pattern, result output pattern generating technique.First, by analyzing the working flow and deficiency of the existing Deep Web data integration system, this dissertation proposes the approach of Deep Web data integration based on result pattern. By analyzing the feature of Deep Web result data, it gives the definition of result pattern for Deep Web data source which can express both structure features and semantic features of result data, and lays the well theoretical basis for efficient and accurate query result processing. The dissertation proposes the mechanism of Deep Web data integration based on result pattern. The mechanism takes the result pattern as a core, detects and establishes the conflict records between result patterns. Based on result patterns of Deep Web data sources and conflict records between result patterns, according to different user queries corresponding result output pattern can be real-time, fast and accurately constructed. Result patterns and conflict records which once have been established can be repeatedly used in the stage of query result processing, and lays the well foundation for efficient and accurate query result processing.Second, aiming at semantic annotating performance and repeatedly semantic annotating, this dissertation proposes the approach of result pattern generating in support of efficient semantic annotating. In terms of structrue feature of result pattern, proposes the approach of result pattern structure generating based on feature matrix of Web page data. According to the data organizing feature in Deep Web result page the dissertation gives the definition of feature matrix of Web page data. Structrue feature of result pattern is obtained by constructing and analyzing feature matrix of Web page data. In terms of semantic feature of result pattern, considering that result pattern can be obtained by offline analyzing large quantity of sample data, this dissertation proposes the method of result pattern semantic annotating based on CPN network. The basic features of result data are given. The relation between data semantic and data features can be obtained by learning CPN network. Semantic annotation rules which once have been established can support real-time, fast and accurately semantic annotating for similar result pages, and improve the performance of semantic annotation.Third, aiming at the data heterogeneity caused by highly autonomy among Deep Web data sources, this dissertation gives the classification of conflicts among Deep Web data sources, and proposes the approach of result patterns conflicts detecting. By analyzing the feature of query interfaces and query results, the dissertation systematically elaborates the conflicts among Deep Web data sources, and for each kind of conflict there is explicit conflict description and corresponding solution strategy. Then conflict detecting algorithm between result patterns from same domain is given to obtain conflict records, and lays the well foundation for further result output pattern generating and query results standardizing.Fourth, for most existing data extraction methods cannot support data semantic acquiring and nested attributes handling, considering that structure feature of result pattern can support to obtain attribute values of similar result pages, and semantic feature of result pattern can support to annotate attribute values, this dissertation proposes the approach of Deep Web data extraction based on result pattern. The algorithm of data extraction based on result pattern is given. Experiment results show that Deep Web data extraction based on result pattern improves the efficiency and precision.Fifth, aiming at repeatedly patterns matching led by querying the same Deep Web data sourcs for different user queries, based on result patterns of Deep Web data sources and conflict records between result patterns, this dissertation gives the method of result output pattern generating according to the user query. Conflict records and conflict solution rules are detected between two data sources during the stage of conflicts detecting. Due to user queries usually involving multi sources, this dissertation gives the method of conflict integrating rule among multi sources. Finally, the flow and algorithm of result output pattern generating are given. Thus the objective is achieved that efficient and accurate constructing result output pattern satisfying with user queries.
Keywords/Search Tags:Result pattern, Deep Web data integration based on result pattern, result pattern semantic annotation, result pattern conflict detecting, result output pattern
PDF Full Text Request
Related items