Font Size: a A A

Research On Technology Of Deep Web Query Interface Matching

Posted on:2010-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q H CaoFull Text:PDF
GTID:2178360302966562Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, web databases have been used widely. These databases are hidden in the local query interfaces. User must use the local query interface to submit request to get information. Deep Web means the information in database which can't be indexed by the Search Engineer. Recently, Deep Web Data Integration System has been paid more and more attention because of its huge capability of information, high data quality and well formatted structure. Deep Web Data Integration System divides the web databases by domain, and establishes a unique query interface for every domain. User can submit request through the unique query interface to send request to every local query interface at the same time. There exists a query interface matching problem while mapping request between the unique query interface and local query interface.Query interface matching is prerequisite to data integration. This paper first focuses on technology of query interface matching, and proposed a new matching method which uses association mining mines positively correlated attributes to form potential group attributes, and finds synonym attributes by clustering on the base of existed methods, then implements a Deep Web Data Integration System in the field of book. The main work is summarized as follows: (1) Design a new correlation measure based on Mutual Information, and use matrix to implement it. The measure can reflect the character which group attribute often occurs at the same time and appears alone rarely, and solve the problem of sparse and high-frequency attributes. Besides, propose the attribute matrix which only contains 0 and 1 to improve efficiency.(2) Add semantic and domain component to computation of attribute similarity. Use semantic net to compute the most precise semantic similarity. Besides, calculate domain similarity to improve the precision of attribute similarity.(3) Design and implement a data integration system in the field of book. The principle is that make sure the system has no correlation with domain. Everything about domain is stored in a configure file which can be modified while changing application domain. It helps to establish a new system quickly. A data integration system on Book domain is accomplished at the end of this paper.
Keywords/Search Tags:complex matching, Deep Web, association mining, clustering, semantic net, mutual information
PDF Full Text Request
Related items