Font Size: a A A

Research On Detecting Entity Columns Of WEB Tables

Posted on:2018-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:L F ZhangFull Text:PDF
GTID:2348330512975561Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The web offers a lot of high value tables,but the meaning of these tables is rarely explicit for machine.Only by recovering the semantics of tables,can we use these tabular data.To a degree,the entity column express the sematic meaning of the table,so it is proposed to discover entity columns of the Web tables.Accurately detecting the entity columns can greatly enhance the understanding of tables by the machines.Traditional entity column detection techniques tend to rely on the header information,and only single-entity column could be detected.However,the precious and latency of these methods can not satisfy our requirement.In this paper,we propose a new entity column detection approach based on approximate functional dependencies and normalization.We summary our contributions as follows.(1)We first develop an entity column detection method based on approximate functional dependencies which improve the efficiency of entity column detection and the applicability of the algorithm.Different from the traditional techniques,our algorithm does not depend on the knowledge base and header.(2)We present an approximate functional dependencies discovering method which is suitable for the web table.Considering the noise in web tables,our method can express the functional dependencies between web tables' attribute more accurately(3)We propose the concept of entity attribute dependency intensity and define the entity column's semantic strength.We estimate the entity column's semantic strength by entity attribute dependency intensity which is proved can greatly increase the accuracy of detecting the strongest entity columns.(4)The concept of entity attribute dependency intensity is introduced in the algorithm mentioned before.In this way,we can not only discovery the entity column though the entity column's semantic strength,but also mark specific relationship according to the entity property dependency intensity.We conduct the experiment on real-world dataset,the result shows that the approximate functional dependencies detection method has an obvious noise reduction effect.The entity column discovery method based on the dependency relationship between attributes has good performance in terms of validity and time efficiency,and its applicability is stronger than the method before.
Keywords/Search Tags:Web tables, entity column, approximate functional dependences, semantic recovery, normalization
PDF Full Text Request
Related items