Font Size: a A A

Research On Semantically Consistent Entity Augmentation

Posted on:2021-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:W C HeFull Text:PDF
GTID:2518306560986329Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet,data is exploding.There are structured data,unstructured data and semi-structured data in the internet,while structured data is usually in the form of web tables.Compared with other types of data,people can find information they are interested in from web tables intuitively.Entity augmentation refers to the technology of filling in the attribute values of a query table with a large number of structured tables as data sources for a given query table composed of entity columns and attribute names.This technology is widely used in data integration,information retrieval and other fields.In existing entity augmentation methods,the similarity between tables is calculated mainly by pattern matching,so the semantic information of tables is neglected,which leads to the neglect of some tables with low matching degree but expressing the same semantic meaning.In the meantime,some tables with high matching degree do not express the same semantics,which leads to the low coverage and accuracy of the query results.Furthermore,because the process of entity augmentation needs a lot of computation,the existing entity augmentation methods are difficult to put into application.In order to solve above problems,we propose semantically consistent entity augmentation based on distributed computing.The main research work is as follows:(1)A method is proposed for extending the concept of table entity based on knowledge base.By extending the entities in a table,we get a set of concepts,and then the topic and semantic information of the table.(2)An entity augmentation method based on semantic similarity is proposed.Firstly,we represent the concept of a table entity as its word vector.Secondly,we propose an improved TF-IDF algorithm to get the weighted word vector,and then to obtain the semantic vector of the table.Finally,according to the semantic similarity between the tables,we get the candidate tables matching the query table.By the pattern matching algorithm,the attribute values in the query table are expanded.Experimental results show that the result of our entity augmentation based on semantic similarity has higher coverage and accuracy than that of the original methods.(3)The parallel algorithm of entity augmentation based on spark is implemented,and all phases of entity augmentation based on semantic similarity are parallelized by spark computing framework.The experimental results show that the efficiency of this method is greatly improved compared with the original method based on single machine.
Keywords/Search Tags:Web table, Entity augmentation, Data integration, Semantic similarity, Parallel computation, Spark
PDF Full Text Request
Related items