Font Size: a A A

Research On Knowledge Graph Construction And Representation For Unstructured Data

Posted on:2019-05-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z TanFull Text:PDF
GTID:1368330611493117Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of the World Wide Web and Semantic Web technologies,more and more people are beginning to acquire information and knowledge from the Internet.Based on people's desire for information and knowledge,a large number of search engines,intelligence analysis,and automated question answering systems have been developed to meet people's information and knowledge need.However,with the rapid increase of data scale in recent years,the traditional search engines that return URLs might find it difficult to return accurate answers and the related knowledge behind it.Therefore,the knowledge graph led by Google has emerged.Its main purpose is to provide structured knowledge to users rather than separated information points.The knowledge graph can greatly improve the accuracy of knowledge query,the accuracy of intelligence,and extend the boundary and scope of knowledge acquisition.As a consequence,it has been widely concerned by industry and academia.But at this stage,there are two obvious deficiencies in the knowledge graph:(1)the coverage of knowledge is wide but not deep enough and the problem of data sparsity is serious.Also a large amount of knowledge is hidden in unstructured data;(2)The knowledge representation model is relatively simple,and the traditional symbolic representation models are difficult to describe the semantic information of the entity and relationship in the knowledge graph.In order to overcome the above two problems and improve the robustness of the knowledge graph and the ability of knowledge representation.In this paper,we focus on four aspects: web page information extraction,jointly entity and relation extraction,entity linking and knowledge representation.To extract knowledge from unstructured data,firstly,we need to obtain data sources for unstructured data.The most important source is the vast amount of Internet data.So how to analysis the data in the Internet and get plain text is the key point to be studied.The traditional information extraction performs text analysis by means of configuration templates and other similar approaches,which suffer from the problems of low efficiency and poor scalability.Therefore,in this paper,we propose a title-based web page information extraction model TWCEM,which extracts and collates the content of each web page by title features.The model can effectively filter the noise and locate the content position more accurately,thereby improving the extraction performance and effectively reduce time costs.After the information extraction is completed,we need to extract the entity and relation in the text.The traditional pipeline extraction methods have problems such as information fragmentation and error transmission.In order to solve these problems,we propose a jointly entity and relation extraction model TME,which can consider the correlation between entity and relation,and well finds the multi-triples contained in each sentence,improving the feasibility and effectiveness of knowledge extraction in unstructured data.It is proved by experiments that the performance of TME in the jointly extraction is significantly higher than other extraction models.After the jointly entity and relation extraction is completed,the extracted entity mention needs to be linked to the existing knowledge graph,that is,the entity linking.The traditional entity linking methods leverage the artificial features and classification to link each entity mention,which lead to the low linking accuracy.In this paper,we propose a entity linking method based on structured feature,Elesa,which combines the context features of the entity,the structure features and the entity ID features to jointly represent the entity feature vector.In addition,it also passes through Bi-LSTM model adds the attention mechanism to extract the contextual features of the entities and mention.The advantage of this approach is that it can capture both semantic and positional features.The accuracy and precision of the Elesa model on practical linking problems are verified by a large number of data sets,and Elesa achieved state-of-the-art performance in similar entity linking algorithms.After the preliminary construction of the knowledge graph,in order to expand the coverage and precision of the existing knowledge graph,we need to reason and complete on existing knowledge graph.Traditional representation methods have high complexities and low precision.Therefore,this paper proposes a large-scale knowledge graph representation model based on the entity features combination,CombinE,which explores the solid features from two complementary perspectives-the plus and minus combinations.For plus combination,we consider that relations tend to denote the shared features of a set of entity pairs,and each entity pair is a projection of abstract relation.Hence the basic idea of plus combination is that the features of each relation are concentrated expressions of the entity pairs set that one relation contains.Solely relying on plus combination may result in negative triplets of corrupted facts.Therefore,for minus combination,we incorporate entity-specific features to complement the representation model,which can characterize the prominence and offset the divergence between head and tail entities.Through extensive experiments on different-scale real-life datasets,CombinE is demonstrated to outperform state-of-the-art models in entity and relation prediction tasks,especially on large-scale knowledge graphs,Besides outstanding performance,CombinE has obviously advantages on time and memory-space complexities.Through the overall knowledge graph framework design and the knowledge graph construction and representation technology,it can provide a feasible method and solution for the construction and representation of knowledge graph from unstructured data.Besides the paper also provide a feasible technical solution for the actual knowledge graph construction.
Keywords/Search Tags:unstructured data, knowledge graph, information extraction, entity and relation extraction, entity linking, knowledge representation, deep learning
PDF Full Text Request
Related items