Functional Dependencies Discovery Based On Sampling

Posted on:2020-11-02

Degree:Master

Type:Thesis

Country:China

Candidate:C X Gu

Full Text:PDF

GTID:2428330578983459

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In relational databases,function dependency discovery is a very important database analysis technology,which has a wide range of applications in the fields of knowledge discovery,database semantic analysis,data quality assessment and database design.In traditional centralized data sets,the study of function-dependent discovery has been very thorough.However,with the development of the times and the arrival of the era of big data,the total amount of data information has increased geometrically,and the scale of the database has also grown rapidly.In the past,centralized data sets have been restricted for various reasons such as physical equipment,and in some occasions.It is no longer sufficient to meet the needs of the scene.In this context,distributed database system is created,which is more maintainable,more scalable,and more fault-tolerant than a centralized database.However,at the same time,distributed databases also bring more complicated problems in data processing and management.Knowledge discovery for centralized databases does not apply to distributed databases.However,the existing function-dependent discovery algorithms for distributed data sets can correctly perform function-dependent discovery on distributed data sets,but the main verification methods are still concentrated after the data is migrated,and the efficiency is low.Therefore,the main research content of this paper is parallel function dependency discovery on distributed data sets.This article starts with the following aspects to achieve efficient function dependency discovery:?1?Using the method of sampling verification,first verify the candidate functional dependency on the sampling dataset on the master node.If the candidate functional dependency doesn't holds on the sampling dataset,it doesn't hold on the complete dataset,according to the theorem.Since this functional dependency must not hold on the complete set without verification,thereby saving the overhead of communication,task assignment,etc.,which is required for the function to perform global verification,thereby improving efficiency.?2?Using the F_k-1�F_k-1-1 algorithm originally used for frequent pattern mining to generate candidate function dependencies,this method takes up less storage space than the prefix tree record generation method,thereby saving the application and releasing the storage.The time of space can also avoid the shortage of storage space.?3?In the efficient distributed computing framework Spark,design a distributed function discovery algorithm suitable for the framework,so that it can perform function dependency discovery and efficient use of calculations on each node of the distributed data set.Resources to increase efficiency.Finally,the experimental results show that the proposed framework has good feasibility and effectiveness.Experimental results show that the framework can efficiently perform function dependency discovery in distributed situations.

Keywords/Search Tags:

functional dependency, knowledge discovery, parallel computing

PDF Full Text Request

Related items

1	Research And Implementation Of Distributed And Parallel Algorithm For Large-Scale Functional Dependency Discovery
2	Spark-based Distributed Functional Dependency Discovery Algorithm
3	Research On Micro Inconsistencies Of Data
4	Automated knowledge discovery from functional magnetic resonance images using spatial coherence
5	Research And Implementation Of Process Object Knowledge Discovery System Based On Parallel Computing
6	Research Of Energy-Efficient Scheduling Algorithm Based Task Dependency On Homogeneous Clusters
7	Granular Computing Based Knowledge Discovery And Its Applications
8	Contributions to parallel and distributed computing in knowledge discovery and data mining
9	Mining Entity Columns Of Web Tables Based On Functional Dependency
10	Graph Model Of Knowledge Discovery Method