Font Size: a A A

Detecting And Annotating Entity Columns Of Web Tables

Posted on:2016-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:X R RenFull Text:PDF
GTID:2308330470955578Subject:Computer technology
Abstract/Summary:PDF Full Text Request
There exist a lot of web data tables on the Internet, which are generally used to describe a entity or the relations of entities. However the tables tend to lack critical information like table names and column names, resulting in difficult for computers to automatically recognize the semantics of the tables. This problem brings the table search many difficulties, and the value of the network table will not be able to be fully utilized.To enable the computer to form an accurate understanding of the data tables, the main approach is via column labels and entities. First, look for the column labels for each column, and then analyze the relationship between the label and the label, and then find table topic by identifing the entity column of the table. The traditional entity column discovery methods, only can find single entity, which do not consider tables containing a multi-entities column. In addition, because the traditional entity column discovery methods do not take into non-standard network table structure account, resulting in that the effect is not ideal. Based on the traditional entity finding methods, we improve it, making it effect for finding single entity columus and multi-entity columns. My main research work are as follows:Firstly, for column label recovery, this paper improved the traditional algorithm. we consider both the possibility of concept matching and times of matching to improve the quality of the column labels recovery.Secondly, we proposed a strongest semantic entity column detecting algorithm. It is not only used for single-entity web table but also multi-entity web table, we first construct a column topology graph. Then we draw PageRank thought, considering both the quality of the linked chain and the linked chain number, to evaluate node. At last, we select strongest sementic node as strongest semantic entity column. This method applies not only to totally attribute dependence, also applies to transfer property dependent, or dependent on the situation of incomplete property.Thirdly, based on division thought, we proposed a column entity detection algorithm. Which makes up the traditional algorithms drawback that can only find a single entity column, and which has a good effect in detecting a single entity column and multi-entity columns.Finally, by experiments, we prove our method considering both the possibility of concept matching and times of matching can significantly improve the quality of recovery column label. In addition, Based on division thought and PageRank algorithm, our entity detection algorithm has a good effect and makes up the traditional algorithm’s drawback.
Keywords/Search Tags:Web table, Entity column, PageRank, Attribute dependence
PDF Full Text Request
Related items