Font Size: a A A

Research On Issues For Uncertainty Of Query In Deep Web

Posted on:2011-03-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:P PanFull Text:PDF
GTID:1118330332481349Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Due to the large-scale, high-quality of data in the deep web and the explosion of web databases, more and more people want to get infromation from the deep web. However, web databases have dynamic, high heterogeneous characteristics and hide behide the web pages, which make great challenges for getting information from these databases.Since the web databases are hidden behide the query interfaces of the web databases, query should be submitted to the interfaces and then the results are showed. For the large scale access to the deep web in certain field, integration is general adopted, i.e. constrcuting the mapping between the integrated interface and the interfaces of web databases, and translating the query submitted to the integrated interface into appropriate forms for the interfaces of web databases.Deep web is large-scale, high heterogeneous, high dynamic and has wide variety of sources, which make integration process manually is impractical. However, in the process of automatic integration, including interface abstraction, interface integration, query, labeling, results abstraction, duplicate checking and other activities, randomness which make process has serveral probability for results and fuzziness, which make it difficult to assort as a result of the uncertainty on intension and extension data collection, will appear. The randomness and fuzziness above-mentioned is regarded as uncertainty in some research works. For these uncertainty issues, it would reduce the value for use due to the low quality of data if traditional "uncertainty removal" appraoch is used. So how to efficiently solve these uncertainty issues becomes the great challenge in improving user query satisfaction in deep web.In deep web, randomness and fuzziness in data integretation of deep web occurs frequently in data itself or process of integrating, which makes some new great challenges for improving the satisfactory degree of uers queries. These new challenges could not be efficiently solved by the existing works. The issues would be solved are shown as follows: 1 As the base of query and integration, the randomness in process of interface integration and interface mapping should be considered adequately in order to build high quality integrated inteface and exact and completeness interface mappings.2 Interfaces have different expression ability, so it is necessary to find the closest interface express form for the integrated interface query ability in the local interfaces to meet the query requirements of users.3 The fuzziness in the process of duplicated record detection and data fusion affects the data quality greatly. It is nesessary to build the appropriate process and data expression an fuzzy process for duplicated record.4 How to find an efficient appraoch to provide user data which satisfy user preferences is the benchmark for evaluating the query quality.This paper targets to providing the satisfactory query for users in deep web, makes research on the query process in deep web, and solve the randomness and ambiguity issues in the above-mentioned process. The research focuses on four aspects:generation of integration interface and probability mapping; query transformation in paobability mapping; processing of duplicated records with membership ambiguity; user bias based query on the duplicated records with probability.The contributions of this paper are:1 For the randomness generated in the automatic mapping between integration interface and query interfaces, this paper proposed an approach for generating integrated interface based on some possible integration interfaces by dusting ensembling, and proposed the probability mapping generation approach based on the weighted best match in bipartite graph, which effectively promote the quality of interface integration and mapping.2 For the inefficiency in query transformation, this paper extends the scope of application of materialization results, design efficient minimal predicate finding algorithm and the optimal predication set finding rewriting algorithms, which effectivley improve the efficiency of query transformation. 3 For the fuzziness in the process of duplicate web record, this paper combined the duplicated record detection and data fusion, and provided expression for duplicated record sets, then proposed an approach for the large-scale data in deep web which promoted the quality for the process of duplicated record detection and data fusion.4 For the query according to the user preferences on the probabilistic data, this paper improved the algorithm based on the global top-k query semantic, proposed a top-k skyline approach on the probabilistic data, which provides users with an efficient algorithm meeting requirements for the users's preferences query.
Keywords/Search Tags:Deep Web, Uncertainty, query in Deep Web, Data Integration in Deep Web, Interface Integration, Interface Mapping, Query Translate, User preference, Duplicated Record Detection
PDF Full Text Request
Related items