Approaches To Explaining Why-Not Questions For Provenance Analyses

Posted on:2018-09-14

Degree:Doctor

Type:Dissertation

Country:China

Candidate:C Y Zong

Full Text:PDF

GTID:1368330572459079

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Data quality and usability issues widely exist in many database query applications such as relation database,stream database,spatial database and other kinds of databases.Providing provenance analyses for query results can help users to better understand data and query.As an important question in query results,why-not question is to figure out why an expected data is missing in the query result and what should be done to make the desired data appear in the query results.Providing provenance analyses explanations for why-not questions has become a hot research topic in database community.Explaining why-not questions can effectively improve the quality and usability of database as well as a complement to the existing database query systems.In order to explain the why-not questions on different queries,many explanation models have been proposed.With the continuous improvement of the explanation quality requirements of query users and increase of the query types,however,the existing explanation models are still far from meeting the user's requirements.The existing explanation approaches for why-not questions face enormous challenges.Researching on efficient explanation approaches for why-not questions is important in both theory and practice.This dissertation deeply studies several typical problems of explaining why-not questions,and proposes several efficient index and explanation approaches with regard to why-not questions on different databases,and designs and implements a provenance analyses system for ocean data acquisition and integration.The contributions of this dissertation can be summarized as follows:(1)Existing explanation approaches of why-not questions on relational data exist some disadvantages.To address the issue,in view of a rational that people try to make as few mistakes as possible,i.e.,users try to modify the original database as few as possible,we propose an effective explanation approach to minimally explain why-not questions on simple SPJ(Select-Project-Join)queries based on data modification.Firstly,this explanation approach constructs the query template for each query relation based on query statement and missing data.Then we construct minimal join tuples which contain missing data.And then we obtain the minimal explanations of missing data.The explanation approach not only improves the quality of explanations,which returns correct and reasonable minimal explanation for users,but also improves the efficiency of the explanation,which dues to our approach only returns the most possible explanations(which minimally modify the original database)for query users.Moreover,existing approach cannot explain why-not questions on some complex queries which contain query cycle and relation copy.To address this issue,we propose a minimal explanation approach based on a mapping structure for candidate tuples.Experimental results show that our explanation approaches can efficiently return high-quality and minimal explanations for missing data.(2)Existing explanation approaches do not consider the dynamic feature of stream data and they cannot directly applied to explain why-not question on stream data.To address this issue,we propose an explanation approach to answer why-not questions on stream data.The method fills the gap in why-not question on acquisitional stream data.Firstly,we propose the explanation framework of why-not questions on stream data.Secondly,we extract the change rules of stream data based on dynamic features of stream data.To repair the violated change rules in the process of explaining why-not questions on stream data,the tuples(called violation tuples)which violate change rules need to be chased.And we present the concept of single chasing and cascaded chasing.And then we propose a basic explanation approach for why-not questions on stream data.However,chasing the violation tuples by scanning all the tuples in query relations takes too much time.To improve the efficiency of chasing violation tuple,chasing class is presented.And we propose a new index structure ChaseClass based on chasing class.Based on this,we propose an efficient improved explanation approach.Experimental results show that the efficiency of improved explanation approach is enhanced by 2-3 orders of magnitude compared with the based explanation approach.And improved explanation approach can efficiently return high-quality and minimal explanations for why-not questions on stream data?(3)In order to further improve the quality of spatial data clustering analyses,the explanation approach for why-not questions on DBSCAN is proposed.Firstly,we present the definitions of why-not questions on DBSCAN and the explanation for why-not question on DBSCAN.Secondly,to ensure the quality of explanation for why-not questions on DBSCAN,we define constraints and metric functions for explaining why-not questions on DBSCAN,based on a rationality of explaining why-not questions that original query results should be kept as much as possible in the explained query results.Thirdly,in view of the clustering mechanism of DBSCAN,we analyze the factors that affect the clustering result of DBSCAN,including the processing order of clustering objects and the clustering parameters of DBSCAN.Based on this,we present two different explanation approaches for why-not questions on DBSCAN.One approach is based on data modification,which is to make the why-not object appear in the desired cluster by adjusting the processing order of objects in databases.The other is based on query refinement,which is to make why-not object appear in the desired cluster by modifying the parameters of DBSCAN(including distance constraint ? and density constraint m).Experimental results show that the proposed explanation approach can return high-quality explanations for why-not questions on DBSCAN,while preserving the original clustering results to the maximum extent.(4)Combining with the requirements of intelligent marine applications,we design and implement a provenance analyses system for ocean data acquisition and integration.Firstly,this system obtains original marine data by employing sensors(network),GPS(global positioning system),RFID(radio frequency identification),cameras and mobile acquisition devices,then stores provenance data according to the 7W model,through cleaning and integrating this collected data.And then it also provides hydrological marine information query function according to the specific query conditions inputted by the clients,and the lineage changing process of large-scale ocean hydrological data was displayed dynamically in view of models of this system.Moreover,this system also provides explanation function for why-not questions on ocean hydrological data queries and why-not questions on clustering ocean spatial objects.Regarding the query requirements on different datasets,this dissertation studys on explanation approaches of different why-not questions,including why-not questions on relational data,why-not questions on stream data and why-not questions on spatial data.And this dissertation designs and implements a provenance analyses system for ocean data acquisition and integration.This dissertation builds a foundation of providing reasonable and high-quality provenance analyses explanations for why-not questions on queries.

Keywords/Search Tags:

why-not questions, relation data, stream data, spatial data, data modification, query refinement

PDF Full Text Request

Related items

1	Research On An Application Of Data Stream Query And Data Stream Mining In Oil Field
2	Research On Uncertain Data Stream Database System
3	Research On Algoritms To Process Skyline Query Over Data Stream
4	Research On The Technology Of Continuous Query Processing Over Data Stream
5	Research And Design On The Data Stream System
6	Study On Data Stream Techniques And Its Application In Electric Power Information Processing
7	Research And Implementation Of Efficient Distributed Spatial Range Query Technology
8	Universally Applicable Data Stream Management Prototype System TTSTREAM’s Design And The Key Algorithm’s Search
9	Spatial Data Flow Area Of Research And Implementation Of Query Optimization
10	Complex Rank Query Over Data Streams: Research And Implementation