Font Size: a A A

The Research On Consistent Entity Augmentation

Posted on:2019-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:W J SunFull Text:PDF
GTID:2348330542987660Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,the research on web tables has been getting more and more attention.Compared with text type data,web tables are easier to help people understand the information they are interested in.The information that people are interested in is often scattered in multiple web tables.Given a set of entities,entity augmentation will return values of missing attributes from web tables.This technology is widely used in data integration and search engines.Existing techniques assume that web tables are entity-attribute binary tables.As for tables having multiple columns to be augmented,they will be split into several entity-attribute binary relations,which leads to semantic fragmentation.The result table consolidated by binary relations will suffer from entity inconsistency.At the same time,existing entity augmentation techniques can only return a single result which could not meet users' needs.The objective of our research is to return top-k consistent result tables for entity augmentation when given a set of entities and attribute names.To ensure high consistency and precision of the result table,we propose the concept of consistent matching relationship.And we settle consistent entity augmentation problem by building consistent ?-coverage clique.Based on the fact that answer tables for building result table should have consistent matching relationships with each other,we regard web tables as nodes and consistent matching relationships as edges to make a consistent clique and expand it until its coverage for augmentation query reaches certain threshold y.It is proved that a consistent result table could be built by considering tables in consistent clique to be answer tables.We tested our method on four real-life datasets and compared it with different answer table selection method,which verify the effectiveness of our approach.The results of a comprehensive set of experiments indicate that our entity augmentation framework is more effective than existing method in getting consistent entity augmentation results with high accuracy and reliability.For top-k entity augmentation,we propose two algorithms to settle the problem:top-k entity augmentation algorithm based on consistent matching degree and top-k entity augmentation algorithm based on branch-and-bound.The main idea of two algorithms is to find top-k answer table sets with high consistent supporting degree for returning attributes' values of entities.In order to settle entity inconsistency problem when augmenting entities with multiple attributes,we propose that each answer table should have high consistent matching degree with each other.Experimental results show that two algorithms have implemented top-k entity augmentation with high result consistency and high accuracy.The algorithm based on consistent matching degree is more effective in getting rich a diversity of results,while algorithm based on branch-and-bound is more effective in getting reliable entity augmentation results.
Keywords/Search Tags:Web table, Data integration, Entity augmentation, Consistency, Branch-and-bound, Consistent matching degree
PDF Full Text Request
Related items