Font Size: a A A

Research On Entity Summarization Methods For The Semantic Web

Posted on:2021-08-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q X LiuFull Text:PDF
GTID:1488306500467374Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The Semantic Web is an important trend of the next generation of the World Wide Web.With the development of the Semantic Web,large amounts of RDF data have been created and published.RDF data describe facts about entities as triples.Given an entity to be summarized,the entity summarization problem is to select a subset of important triples from a mass of triples that describe this entity,to satisfy users' information needs.This subset of triples is called an entity summary,and is supposed to contain the crucial information about the entity and satisfies a given capacity constraint.In this thesis,we focus on the design and evaluation of entity summarization methods.The design of entity summarization methods is facing three challenges.First,it should be able to reasonably measure the quality of a triple.Existing works generally measure structural features of a triple based on statistical methods,while few attention has been paid to its textual features,and the influence these content will make on users' reading experience has not been considered.Second,it should be able to understand relationships between triples.The redundancy between triples has been considered by many existing methods,and these methods typically avoid selecting redundant triples by adding constraints.These constraint-based methods are not flexible and could not trade-off the degree of redundancy with other features.Moreover,other complex re-lationships between triples may affect the quality of the summary and should also be identified and measured.Finally,it should be able to fully comprehend users' prefer-ences.Existing entity summarization methods are mainly unsupervised,and relying on the assumptions made by experts.However,experts' assumptions can hardly cover all the factors of users' preferences.Therefore,it is necessary to explore other mechanisms so that the summarization method can learn users' preferences directly from the data.For the evaluation of entity summarization methods,benchmarks are crucial.The quality of a benchmark determines whether the performance of entity summarizers can be compared fairly and comprehensively.Existing entity summarization benchmarks have many limitations,including using single RDF dataset,domain-specificness,small size,and triple incomprehensiveness.Besides,most of the existing benchmarks are no longer publicly available.Therefore,the lack of publicly available high quality benchmark is an urgent problem for entity summarization evaluation.In this thesis,we try to address these challenges in different ways,and they form the following four works:1.We propose ESSTER,an entity summarization method based on content optimiza-tion.This method gives a flexible measure of the structural importance of a triple by combining its global and local features in the RDF graph.In order to improve the reading experience of users,we propose the concept of readability,and pro-pose to calculate the readability of a triple based on a textual corpus.To reduce redundancy,ESSTER measured logical,numerical and textual redundancy.There-fore,the preference of low redundancy could be an item in the objective function and can be optimized jointly with other features.ESSTER jointly optimizes the structural importance,textual readability and low semantic redundancy.It models and solves this entity summarization problem as a 0-1 quadratic knapsack problem.Experimental results show that,ESSTER achieves the best performance among un-supervised entity summarization methods.2.We propose Deep LENS,an entity summarization method based on content encod-ing.Existing entity summarization methods are mainly unsupervised.To learn user preferences from labeled data,Deep LENS explores the potential of supervised learning for solving the entity summarization problem.It designs a deep neural net-work model based on multilayer perceptron and attention mechanism.This model could encode entity description according to its relation with the candidate triple,and this encoding is permutation invariant.This model finally outputs a score for the candidate triple using the encoding of entity description as context,and this score will be used for generating the summary.Experimental results show that,Deep LENS significantly outperforms all the compared methods.3.We study entity summarization with user feedback,and accordingly propose an en-tity summarization method called DRESSED.To directly get the preference of indi-vidual users,we introduce a user feedback mechanism for the entity summarization problem,and therefore an entity summarization method could integrate the feed-back information for generating improved summaries.Specifically,we define the cross-replace scenario for user feedback.In this scenario,the DRESSED method designs the decision function based on a deep neural network,and solves based on reinforcement learning.Experimental results show that,DRESSED outperforms baseline methods in both settings with real users and simulated users.4.We create an entity summarization benchmark ESBM.This benchmark overcomes the limitations of existing benchmarks,and meets the requirements for a high qual-ity benchmark.This thesis introduces the design and construction of the ESBM benchmark in detail,and presents analysis results about the collected data.At present,ESBM is the largest available benchmark for entity summarization,and has been used in several existing works.The experiments in this thesis which are conducted based on this benchmark also verify its effectiveness.
Keywords/Search Tags:Semantic Web, Entity Summarization, User Feedback, Benchmark
PDF Full Text Request
Related items