Research On Semantically Consistent Entity Augmentation

Posted on:2021-11-10

Degree:Master

Type:Thesis

Country:China

Candidate:W C He

Full Text:PDF

GTID:2518306560986329

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of the Internet,data is exploding.There are structured data,unstructured data and semi-structured data in the internet,while structured data is usually in the form of web tables.Compared with other types of data,people can find information they are interested in from web tables intuitively.Entity augmentation refers to the technology of filling in the attribute values of a query table with a large number of structured tables as data sources for a given query table composed of entity columns and attribute names.This technology is widely used in data integration,information retrieval and other fields.In existing entity augmentation methods,the similarity between tables is calculated mainly by pattern matching,so the semantic information of tables is neglected,which leads to the neglect of some tables with low matching degree but expressing the same semantic meaning.In the meantime,some tables with high matching degree do not express the same semantics,which leads to the low coverage and accuracy of the query results.Furthermore,because the process of entity augmentation needs a lot of computation,the existing entity augmentation methods are difficult to put into application.In order to solve above problems,we propose semantically consistent entity augmentation based on distributed computing.The main research work is as follows:(1)A method is proposed for extending the concept of table entity based on knowledge base.By extending the entities in a table,we get a set of concepts,and then the topic and semantic information of the table.(2)An entity augmentation method based on semantic similarity is proposed.Firstly,we represent the concept of a table entity as its word vector.Secondly,we propose an improved TF-IDF algorithm to get the weighted word vector,and then to obtain the semantic vector of the table.Finally,according to the semantic similarity between the tables,we get the candidate tables matching the query table.By the pattern matching algorithm,the attribute values in the query table are expanded.Experimental results show that the result of our entity augmentation based on semantic similarity has higher coverage and accuracy than that of the original methods.(3)The parallel algorithm of entity augmentation based on spark is implemented,and all phases of entity augmentation based on semantic similarity are parallelized by spark computing framework.The experimental results show that the efficiency of this method is greatly improved compared with the original method based on single machine.

Keywords/Search Tags:

Web table, Entity augmentation, Data integration, Semantic similarity, Parallel computation, Spark

PDF Full Text Request

Related items

1	The Research On Consistent Entity Augmentation
2	Research On Lexical Semantic Similarity Measurement Based On Knowledge Integration
3	Research On Semantic Similarity Calculation Method And Data Augmentation In Chinese Short Text
4	Design And Implementation Of Parallel Data Mining System Based On Spark
5	Research And Implementation Of Semantic Similarity Parallel Calculation Based On Association Of Tourism Data
6	Research And Application Of Multisource Data Intergration Based On Ontogy
7	Research On Entity Resolution Framework And Key Techniques For Big Data Integration
8	SimRank Computation On Large Graphs Based On Spark
9	The Research And Implementation Of Spark And NoSQL Databases Integration
10	Research On Semantic Similarity Computation Method Of Linked Data Based On Multi-granularity