Font Size: a A A

Research On WEB Table Augmentation

Posted on:2018-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:F QiFull Text:PDF
GTID:2348330512993196Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of big data,large amounts of data emerge on internet.Almost every available web page contains HTML tables with rich-information called web tables.Compared with text data,web tables are structured which help people find information they are interested in.To make the users integrating structured information conveniently,web table augmentation refers to extend table content based on main column or other known information.Some researchers have designed query systems for table augmentation,but there exist lots of limitations.On the one hand,these systems take the main column as the only basis to expand the entity-attribute binary table consisting of the main column and extended column.Using this technology to extend tables with more than one column to be expanded,the tables made of the combination of the results will suffer from entity inconsistent problem.On the other hand,the result tables provided by system are unique.When users want to check whether the data source is reliable,identify possible errors in the results or manually correct some error information according to the extended results,the unique result can not satisfy their requirement.Aiming at the existing problems,this thesis makes an in-depth study on the web table augmentation:Firstly,in order to solve the problem of entity inconsistency,we design the column mapping algorithm on the basis of column overlapping degree and implement consistent web table augmentation method—CCA.Combining the influence of relationship between columns and correlation between tuples,we propose the concept of consistency support degree,and apply it to the filling algorithm after pre-processing the query table.CCA can not only guarantee the high support degree of the candidate table,but also use as fewer number of data sources as possible to fill the result table.According to the experiments,CCA has higher accuracy rate,coverage rate,consistency and less query time cost compared with current research.Secondly,in order to meet the needs of user-screening,we improve the filling algorithm based on CCA and implement the Top-k web table augmentation method TAT.We propose the concept of Top-k support degree and design two algorithms,exclusive algorithm and iterative algorithm,for web table augmentation that return Top-k results based on different requirements from users.The experiments show that TAT could return Top-k consistent results without loss of precision or coverage.
Keywords/Search Tags:Web tables, Column overlapping, Column mapping, Consistent support degree, Top-k Augmentation
PDF Full Text Request
Related items