Font Size: a A A

Research On Data Source Clustering And Query Interface Conversion Of Deep Web

Posted on:2012-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:P F ZhangFull Text:PDF
GTID:2218330368958671Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the fast increase of the Internet resource, Web has become an important way of getting information. Web can be divided into two groups: Surface Web and Deep Web. Deep Web has more resource than Surface Web and its resource is of higher quality. Since web database is distributed in every domain and can only be accessed through query interface, it's necessary to build an integration system for making better use of it.Organizing the database by its domain is an important part of Deep Web data integration, which helps utilize the information in the database. Web pages of the database query interface always share some common words in their titles and keywords which can reflect the domain the database belongs to. Based on this, we propose a frequent itemset based clustering method to organize the database. The number of the clusters equals the number of the frequent itemsets and the frequent itemset can be a label for describing each cluster. The experiment shows the F-measure can be above 0.92.Query conversion is also an important part of data integration, which takes charge in converting the queries between integrated query interfaces and web query interfaces. As web query interfaces are of high heterogeneities, the solutions always focus on approximate conversion. In this paper, we studied and model the query conversion problem and then proposed a solution for query translator which solved the source heterogeneity and domain portability and improved the accuracy and efficiency of query conversion.
Keywords/Search Tags:Deep Web, Web database, frequent itemset, query conversion
PDF Full Text Request
Related items