Semantic Recovery Of Web Tables Based On Crowdsourcing

Posted on:2017-03-25

Degree:Master

Type:Thesis

Country:China

Candidate:H X Liu

Full Text:PDF

GTID:2308330482487120

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The Web contains a large amount of structured tables, most of which are lack of header rows, primary keys and foreign keys. The structure information is the basic of data search and integration for web tables. Algorithmic approaches have been proposed to recover structure information for web tables, but state-of-the-art technology is not yet able to provide satisfactory accuracy and recall. In recent years, crowdsourcing has been applied in natural language process, image identification, etc. We propose to improve the performance of web table annotation by crowdsourcing which leverages human intelligence to complete annotation tasks.For header and entity column recovery, we propose an improved K-means algorithm based on novel integrative distance for task reduction to minimize the number of tuples posed to the crowd. To recommend the most related tasks for human workers and decide the final answers more accurately, an evaluation mechanism is also implemented based on Answer Credibility that measures the probability of which a worker’s intuitive answer comes to be the final answer for a task. The result of extensive experiments conducted in real-world datasets shows that our framework can obviously improve annotation accuracy and time efficiency for web tables, and our task reduction and answer evaluation mechanism is effective and efficient for improving answer quality.For foreign key recovery, we raise similar foreign key and corresponding scoring mechanism to get the candidate answers for crowdsourcing according to the characteristics of web tables. We also apply a mixed model of task reduction based on attribute dependency and dynamic question schedule based on collision detection to reduce the number of tasks. Repeated experiments demonstrate that our hybrid framework perform well in precision and recall of foreign key annotation and obviously reduce the number of crowd tasks.

Keywords/Search Tags:

Crowdsourcing, Web tables, Semantic recovery, Data integration

PDF Full Text Request

Related items

1	Research On Detecting Entity Columns Of WEB Tables
2	Crowdsourcing For Synonyms Proofreading And Acquisition In Chinese Large-scale Semantic Knowledge Base
3	Research On Foreign Key Detection Algorithm For Web Tables
4	Research Of Integration Based On Semantic And The Development Of Support Tools
5	The Evaluation For Priority Of Developers And Bug Reports In Crowdsourcing Test Platform
6	Research On Semantic Integration And Application Of Distributed Data Based On Linked Data
7	A Study On Effective Spatio-textual Data Integration And Delivery
8	Research On Multi-platform WeChat Data Recovery And Analysis Technology
9	Research On Semantic Compression Algorithms Of Massive Data Tables
10	Research On Critical Techniques In Unified Translation For Multi-source Binary Code