Font Size: a A A

Research On Foreign Key Detection Algorithm For Web Tables

Posted on:2020-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:J M WangFull Text:PDF
GTID:2428330578952484Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of information technology,more and more tabular data has emerged on the Internet.These structured web tables have a wide coverage and a large amount of information,which has attracted much attention.As one of the most important constraints in databases,foreign key relationships between tables are crucial for data integration and analysis.For large amount of web tables from heterogeneous data sources,foreign keys are not specified in most cases.Therefore,detection of foreign key becomes a significant step in understanding and utilizing web tables.The existing foreign key relationship detection algorithms have certain limitations:on the one hand,most of the current foreign key relationship detection work are directed to the traditional relational tables,and rely on the structural information in the table for foreign key detection.However,web tables usually lack schema information such as column names and table names,so this method does not apply to web tables.On the other hand,the existing foreign key detection algorithm can only guarantee the semantic correlation between two column attribute values,but does not consider the large number of conflicting foreign keys due to the heterogeneity of web tables and the attribute reference rules that the foreign key relationship needs to satisfy.Aiming at the above problems,this paper has done an in-depth study on the foreign key relationship detection for web table.The specific work is as follows:(1)A foreign key detection algorithm based on distribution fitting is proposed to solve the problem of foreign key detection in web tables.First,we relax properties that a foreign key should satisfy and evaluate whether a candidate is a true foreign key or not by distribution fitting of column values;in addition,a multi-pass division method is proposed for partitioned distribution diagram construction so that our method can detect foreign key relationships more effectively and be scalable to large web tables.(2)A foreign key detection algorithm based on conflict dependency elimination is proposed to improve the accuracy of the algorithm.Considering the conflict dependence in the foreign key relationship,the layer structure of inclusion dependency graph is established,based on the layer-by-layer elimination of conflict dependency,true foreign key relationships are obtained.(3)A lot of experiments have been done on real web table datasets to verify the effectiveness of our algorithm.The results show that the proposed algorithm is more suitable for web tables than traditional methods,and is superior to current methods in terms of efficiency and scalability.
Keywords/Search Tags:Foreign key relationships, Web tables, Distribution fitting, Conflict dependency, Data integration
PDF Full Text Request
Related items