Font Size: a A A

Rule-based Database Completion Via Transformations,efficient Execution And Incremental Visualization

Posted on:2022-08-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:D RuiFull Text:PDF
GTID:1488306557494884Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Database systems play a vital role in the development of intelligent machines.The progress of database systems technology depends on innovative solutions to various problems in its topics of database curation and usability.Research literature claims that a significant chunk of the effort in data management tasks is devoted to merging and assessing the quality of data.Missing values in databases also consume a significant fraction of resources while building an intelligent system.Usability has a considerable impact on the rapid development of an intelligent system by helping developers to reduce their learning curve to use new technology.Therefore,the importance of these topics cannot be overstated,which motivates us to perform research on them through this dissertation.The current work proposes to perform database completion using rules.The initial problem we notice is that database completion rules usually have syntactic errors.These can cause some database completion rules not to execute.As a result,many missing values in the database remain unfilled.Even after correcting these rules,the process of correction can introduce spurious rule conditions.Existing approaches to correct database preparation rules suffer from various drawbacks.Some require external datasets,some require significant user involvement in the form of many training examples,or some do not involve the user at all,which can be unreliable.Straightforward use of existing techniques such as Q-grams based data transformations can be inefficient because they require processing the entire dataset to find data transformations.Then we notice that the changed database completion rules generated from the previous step have spurious conditions.We can make the system more efficient by identifying such spurious rule conditions at an early stage and prune them.We also provide the user with the flexibility to change such database completion rules and verify the results.Then we find that database completion rules can have missing conditions inside them.Presently,there are no approaches in the existing literature for the imputation of missing conditions of database completion rules.Then we realize that the research literature proposes various visualization solutions to help users in that task of database completion and matching.But such visualizations are not generated incrementally and are not very robust,and no other work in literature combines completion and matching visualizations to generate a hybrid visualization yet.These shortcomings in current work motivate us towards the following contributions:1.The first contribution is on rule correction and ranking the steps in the database completion system.The thesis proposes to use data preparation system components in the form of PBE data transformations and record matching rules to correct database completion rules.Also,it proposes ranking the steps of the augmented database completion system that leverages PBE data transformations using intermediate results and innovative usage of query logs.2.The second contribution is in the problem of quick detection for spurious conditions in the modified database completion rules generated earlier.It involves constructing sketching data structures such as bloom filters for incomplete records.The benefit of our approach is that it is based on leveraging record matching rules to create required summaries relevant to missing values.We also propose an adaptive edit distance threshold approach to select relevant attribute values for missing entries.We also propose a score function to find candidates missing conditions in data completion rules from entity resolution rules.We show that the problem of maximizing the proposed score function is an NP-complete problem.We,therefore,introduce a greedy based approximate solution and a simulated annealing-based approximate solution.3.The third contribution is to modify existing visual data quality assessment in data preparation systems to generate incremental robust visualizations for database completion for which we use spatial-oriented visualization and combine with data matching visualization.We leverage record matching rules available in data preparation system components for this purpose.It works by initially generating a heatmap with user involvement,then it automatically continuously refines this heatmap based on visualization data movements relating to data matching visualization.Such visualization allows the user to quickly find the database missing entry values incrementally.The chapter also proposes the technique of feedback-driven provenance to improve robustness.We do this to reduce execution time and improving relevance.
Keywords/Search Tags:Database Completion, Data Preparation, Transformations, Efficient Rule Execution, Visualization
PDF Full Text Request
Related items