Font Size: a A A

Entity Summarization Based On Web Text And Knowledge Graph

Posted on:2017-01-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:J H YanFull Text:PDF
GTID:1108330485969047Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the integration of Internet of Things, Internet and Cloud Computing, the amount of semi-structured and unstructured data grows explosively. When users retrieve infor-mation, they tend to get lost in massive and fragmented data. It is an urgent issue to help users locate information and knowledge for an entity from the Internet in a fast and accurate manner. On one hand, traditional information retrieval systems aim at retriev-ing massive Web text related to the query, though they are incapable of summarizing the text semantic. On the other hand, for exploiting the text semantic, they integrate billions of semi-structured entity attributes and relations. However, it is still a challenge to help users with knowledge navigation from the massive and heterogeneous knowledge graphs. In this thesis, we aim to solve the problems of information overload and knowledge con-fusion, using techniques for entity summarization on Web text and knowledge graph as research subjects.For the dynamic of massive Web text, this thesis proposes an algorithm to summarize entities from the text. For the intelligent requirements of users, this thesis also designs an approach to solve the context-aware entity summarization on knowledge graph. For the heterogeneity and incompleteness of fragmented information, the thesis proposes an approach for entity summarization across knowledge graphs. The major contributions of this thesis include:·For the dynamic of massive Web text, an algorithm is proposed to summarize entities from the text. In the era of Web 2.0, the descriptions for an event may be found in different data sources, but also deteriorate the phenomenon of information fragmentation. In terms of topical clustering technique, this thesis models these events, and treats the event summarization as a set coverage optimization problem. To form the summarization for events, it also designs a greedy algorithm to solve the problem which is NP-hard.· For the intelligent requirements of users, an algorithm is designed to solve the context-aware entity summarization on knowledge graph. To respond to the intelligent requirements of users and the problems of knowledge overload and con-fusion, the thesis utilizes the topic model to generate user preferences based on the query logs given by users, and designs an algorithm for context-aware entity sum-marization in terms of Markov model.· For the heterogeneity and incompleteness of knowledge graphs, an algorithm is proposed for entity summarization across knowledge graphs. Different knowledge graphs can complete and confirm one another, thus helping users to ob-tain more accurate search results. In terms of Word Embedding, the thesis designs techniques for entity matching and fusion across knowledge graphs, and meets the intelligent requirements from users. The algorithm can not only integrate data from different knowledge graphs, but also improve the coverage and quality of entity summarization.· A prototype is designed and implemented for entity summarization on the frag-mented data. Based on the algorithms proposed in this thesis and some tools in text mining and NLP, the thesis designs and implements a distributed and four-layer entity summarization prototype system, called EntitySummarizer. It can analyze the user query, recognize the user interested entities, and generate multiple types of en-tity summarizations proposed in this thesis. Additionally, it supports the textual analysis, including keyword extraction and generation of event timelines.The techniques proposed in this thesis can not only solve the problems of information overload and ease knowledge confusion, but also play an important role in data prepara-tion, demonstration and application for studying diverse entity summarizations.
Keywords/Search Tags:Entity Summarization, Word Embedding, Knowledge Graph, Entity Matching, Context-aware
PDF Full Text Request
Related items