Research On Functional Dependencies Mining Algorithm Based On Attribute Partition Information Gain

Posted on:2020-11-11

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Jiang

Full Text:PDF

GTID:2428330590971748

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the further development of the Internet era,data has become an emerging means of production.At present,the information systems of various industries contain a large amount of data,especially relational data.These data often has error,and it is difficult to be used effectively.Therefore,people hope to find some effective strategies to correct the data,and in relational data functional dependencies plays an important role in data repair.Functional dependency is an important concept in relational models that can be used for pattern generalization,data cleansing,data repair,data integration and more.The functional dependency discovery method under relational data has been studied for decades,and various function dependency mining methods have been proposed,but some problems still remain.For example,when mining functional dependencies in a database instance with a large number of attributes,the algorithm speed is still not ideal.In recent years,traditional discovery algorithms such as depth-first traversal of DFD have an exponential increase in time complexity.Aiming at solving this problem,this thesis proposed the concept of attribute partition information gain,combining the original DFD functional dependency discovery algorithm with the focused sampling method in HYFD algorithm.It is preferred to use the information gain list between attribute partitions to improve the random walk selection strategy of the next node in the original DUCC algorithm,so as to find the unique attribute combination MUC,and then sampling the dataset by the focused sampling processing method to obtain the non-functional dependency.Finally,the single attribute primary key node,the non-single attribute primary key,and the non-functional dependency node route are pruned,and the starting route of the original DFD algorithm is selected with reference to the information gain list,so that the improved algorithm is theoretically superior to the original algorithm.Finally,this thesis validated the algorithm using the public dataset under Metanome,and developed an excel plugin that can automatically detect and repair data.The experimental results show that the functional dependency mining algorithm based on attribute partition information gain is faster than the original DFD.When the number of records and the number of attributes of the data set is large,the improved algorithm is more robust than the original algorithm.At the same time,due to the focused sampling processing method,when the improved algorithm has a larger calculation dataset,its memory consumption is smaller than the original DFD algorithm.

Keywords/Search Tags:

functional dependency, attribute partition, information gain, relational database

PDF Full Text Request

Related items

1	Bayesian Network Learning Based On Multivalued Dependency Of Relational Database
2	Research On The Storage And Conversion Technology Of XML And Relational DataBase
3	Data Transformation Between XML And RDB With Constraints
4	The Theory's Research On Normalization Of XML Database
5	Research On Update And Query Optimization In Probabilistic Relational Databases With Integrity Constraints
6	The Study Of Data Exchange Between XML And RDB Based On The Method Of Normalization
7	The Research Of Rough Relational Database Attribute Value Decomposition With Application
8	Study On Information System Structure And Homomarphism Based On Attribute Set Information Granules
9	The Research And Improvement Of Mapping XML Schema To Relational Schema
10	Research On AODE Classification Algorithm Based On Attribute Reduction Of Levels