Font Size: a A A

Event And Topic Oriented Entity Ranking In Documents

Posted on:2018-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y H WangFull Text:PDF
GTID:2348330512981310Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of Internet,full flourish of new online media makes it easy for people to share the numerous information effectively.Currently,accumulating a large amount of textual data,new online media record important public events and popular topics in the society.By monitoring public opinion on the Web,the government,the masses and the relevant departments can keep track of the status quo of our society and find out social problems timely.At the same time,it can also help the relevant government departments to make scientific management and decisions.Therefore,it is an important and practical research task to detect hot events and topics from massive web data;Additionally key entities of a document can help to summarize the subjects of the event or the topic that the document describes.Based on large-scale web news data,we detect hot events and topics and extract key entities of related documents to summarize the main elements of events and topics.Main contributions of this paper are as follows:·In this paper,we introduce a novel similarity measure of news articles by means of metric learning.For the massive,unordered and redundant Web news data,we pro-pose a topic oriented method(ToED)for event detection.ToED uses a topic model to learn topic distributions of documents in a news corpus.To detect hot events,a density-based clustering method(ESCAN)is proposed to cluster the collection of documents.·Aiming at key entity selection of the document,we make full use of features of entities that extracted from the document and draw support from Wikipedia and neutral language models to generate external features.Then,we propose a novel ranking model named LA-FSAM based on forward stagewise additive modeling.In LA-FSAM,we employ the AUC metric to construct the loss function and the logistic function to integrate features of entities.Finally,the stochastic gradient descent algorithm is utilized to optimize parameters of the LA-FSAM model.The experiments show the efficiency of the model.After experiments,our evaluation shows the efficiency of the model we proposed.·We design and implement a social hotspot analysis system(KSPOS),which pro-vides an event or topic based search application.Given a user query,it generates a comprehensive overview of relevant documents,including keywords,a semantic network of entities and a timeline summary.
Keywords/Search Tags:Event Detection, Entity Ranking, Public Opinion Analysis
PDF Full Text Request
Related items