Discovering Relations Between Web Tables

Posted on:2016-12-11

Degree:Master

Type:Thesis

Country:China

Candidate:H W Ren

Full Text:PDF

GTID:2298330467472501

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years, a large number of structured tabular data is constantly emerging on the Internet. However, the value of web tables depends not only on the data itself, but also on the relatedness between the data. Only when the potential relatedness between them has been detected, these structured data could be fully utilized. Yet, the problem of discovering related tables has some challenges due to the heterogeneity and uncertainty in web tables. We propose two new types of relatedness between web tables, called snapshot and reference relationship, Which are beneficial for query optimization, and also helpful for returning partial results rapidly when querying on big data, and useful for answering open-world queries in data fushion systems. We propose an algorithm for discovering snapshot relationship. The relatedness between an original web table and its snapshot can be computed based on entity consistency and schema consistency. In order to assign high weights on tables which provide more fresh entities, the concept of entity freshness is introduced into our scoring method. Meanwhile, the content consistency of web tables can be enhanced by applying Bayesian analysis to our relatedness capturing framework. As a consequence, the accuracy of finding snapshots is improved. Repeated experiments prove that the algorithms can capture snapshots with high quality, which perform well in query precision and recall.We also raise a probability model for capturing the reference relationship between tables in this paper. In order to take more attention to entities that exist repeatly in the reference column, the weight of entity for the table is introduced into our scoring mothed. On the other hand, there are amounts of noise data in the web tables. Aimed at reducing the effects on unfriendly entity, our algorithm gives a novel way to identify the noise data with a probability. Thus, the weight of entity for the concept is also considered. Extensive experiments on real datasets demonstrate that the algorithm for detecting the reference relationship can search referenced tables with high quality, which also perform well in query precision and recall facing open-world queries.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Open Shape-Based Directional Relationship Query Technology
2	Research On Location Privacy Preserving Query Techniques In Road Network
3	Research On Database System Performance Optimization Methods
4	Research On Statistical Word-level Semantic Relatedness Computation
5	Research On Key Technologies Of Distributed Rank-aware Query Processing
6	Research On Semantic Processing Technology Based Information Retrieval Model
7	Application Research Of LVM Snapshot Under The Environment Of Linux
8	Pre-query Processing For Semantic-oriented Web Search
9	The Extended Belief Network Retrieval Model Based On Reference Relationship Of Scientific Literatures
10	Research And Implementation Of Chinese Explanatory Opinion Extraction Method Based On Relationship Recognition