Font Size: a A A

Research On The Construction Method Of Chinese Tourism Knowledge Graph Based On Multi-source Heterogeneous Data

Posted on:2020-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiFull Text:PDF
GTID:2438330602952739Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays,in the era of big data,the amount of information on the Internet shows an explosive growth trend.How to dig out useful information from these massive data efficiently and accurately has become one of the research hotspots in the field of intelligent information.Google first proposed the concept of knowledge graph in 2012,which describes the entity and entity relationship in the real world by graph structure.Knowledge graph is a knowledge base extracted from big data,capable of intelligent governance and integration of knowledge,and is the technical basis for providing accurate answers for search engines.The research on knowledge graph construction has shown important application value in knowledge extraction and knowledge organization management of mass Internet knowledge.The existing knowledge map mostly are general knowledge maps facing the whole field.It stresses the breadth of knowledge,but lacking the deep-level knowledge mining of the entities.However,the research on domain knowledge maps with strong pertinence,especially in the field of tourism,is rare and starts late.There is no good way to construct and express the domain knowledge maps at present.The established tourism knowledge graphs are mostly constructed based on encyclopedia knowledge,which is relatively simple and poor-applicability.To solve these problems,this paper studies the construction and application of knowledge graph of Chinese tourism domain from multi-source heterogeneous data.Specific processes include:Analyze the different data source of tourism knowledge graph;Extract knowledge from different sources.Entity alignment of heterogeneous data and application research of tourism knowledge map.The details are as follows:1.In order to build a high-quality tourism knowledge map,this paper analyzes all kinds of multi-source heterogeneous data on the Internet firstly.The method of knowledge extraction based on encyclopedia site and tourism vertical site is adopted.We design the crawler framework for the semi-structured and unstructured knowledge of hudong encyclopedia and baidu encyclopedia.We extract the structured data and organize into the triples of knowledge.Then store the data in the tourism knowledge graph.Besides,In view of incomplete or missing attributes and attribute values in knowledge extraction,a method of attribute value expansion based on CRF and candidate sentences is proposed.Firstly,construct the DAAttribute based on infobox,and extract candidate sentences from corpus of encyclopedia text based on DAAttribute.Then,entity attributes and attribute values were extracted from candidate sentences by using CRF to expand attributes.The accuracy,recall rate and F1 value were used to evaluate the results,and all the evaluation indexes performed well.This method is of great help to the revision of knowledge map.2.In order to solve the problem of homonym and synonym in the knowledge base,We firstly proposed an entity alignment method based on BERT neural network model.Firstly,we use word segmentation technology to mark the word segmentation of the corpus,then put the participle result into BERT model to train the word vector.We set the threshold realize entity alignment by calculating the cosine similarity between the word vectors.In this paper,experimental analysis and comparison of BERT,skip-gram,CBOW and DSG models were conducted.In the evaluation results,the BERT model has the best effect,and the average accuracy reached more than 95%.This work can provide a new reference method for entity alignment.3.In the application research of tourism knowledge graph,we design a knowledge graph visualization system.This system was implemented to display the information of KB.Otherwise,this system allow users to create and correct the triples of knowledge in the KB,and realize the manual correction of knowledge graph based on crowd-sourcing.And it can provide users with multi-source of tourism knowledge services.Finally a q&a model based on tourism KB and Rul was designed to lay a foundation for the application of tourism knowledge graph in q&a system.
Keywords/Search Tags:knowledge graph, tourism, multi-source heterogeneous and knowledge extraction, Entity alignment
PDF Full Text Request
Related items