Font Size: a A A

Construction And Analytics Of An Chinese Enterprise Knowledge Graph

Posted on:2017-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:W L ChengFull Text:PDF
GTID:2308330485963438Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The Web contains huge amount of knowledge, in the form of structured and unstruc-tured data. It is the development trend of the Internet to extract, re-organize and inte-grate these fragmented knowledge into a large-scale knowledge graph. Based on their content, knowledge graphs can be further categorized into open knowledge graphs and domain knowledge graphs. In recent years, the construction of open knowledge graphs has achieved great success, however, these open knowledge graphs do not get full used in many filed applications. On the contrast, domain knowledge graphs get a great demond in many applications, thus we present a domain knowledge graph, Chinese enterprise knowl-edge graph. We extract the relations between enterprises from the listed companies’ an-nouncements, and extract the related events and summaries of the enterprises from the financial news. An empirical study on the statistical characteristics of knowledge graphs and social networks is also conducted.The main research contributions of this paper are as follows:·A framework of the construction of the enterprise knowledge graph is presented. We consider the relation extraction task in building a domain knowledge graph as a classification problem. By training the maximum entropy models for every rela-tionship, the relation extraction task achieved 85% accuracy in average. For some relations, the accuracy is over 95%. It shows that our method has a 12.16% im-provement of F1 measure when compared with the open domain relation extraction approach. Based on the 1.09 million listed companies’announcements and 2.5 mil-lion news articles, we construct an enterprise knowledge graph, containing more than 50 thousand entities and 140 thousand relation instances.·We first employ a clustering algorithm to extract the related events and the events’ developing processes from news articles. A word set coverage algorithm is pre-sented to produce the events’summaries. The experimental result and case studies show that our approach outperforms four baselines.indicating the efficacy of our approach on real-life news data. We extract 8,205 events and their respective sum-maries related to 3,073 listed companies from news articles in total.·We empirically compare the graph structures between different parts of a knowl-edge graph, between different knowledge graphs and between knowledge graphs and social networks. A deep analysis on four knowledge graphs and two social net-works has been conducted based on thirteen statistical metrics and four distributions. It shows that there exist a lot of differences between the statistical characteristics of these graphs, such as distributions of connected component, number of open or closed triangles, clustering coefficient, etc. The analysis can provide suggestions for the data management of knowledge graph. We also conducted an association rule experiment on semantic labels of knowledge graphs, which illustrates the topic relatedness of semantic-related relations.During the process of enterprise knowledge graph construction, we make extensive comparisons on the techniques of building open knowledge graphs and domain knowledge graphs, which is of practicable significance for domain knowledge graph construction. We also conduct an in-depth analysis on statistical characteristics of large-scale knowledge graphs and social networks, which can be referable for the knowledge graph management, such as data storage, indexing, query optimization, etc.
Keywords/Search Tags:Knowledge Graph, Information Extraction, Event Detection, Summary Extraction, Graph Pattern Analysis
PDF Full Text Request
Related items