Font Size: a A A

Temporal Event Summarization Based On Sparse Topic Mining With Semantic Regularization

Posted on:2018-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z YaoFull Text:PDF
GTID:2348330563452595Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Living in the Big Data Age,Big Data often does not mean Big Knowledge.For example,when emergencies occur,the number of related news reports increases exponentially,and how to dynamically track the development of specific emergencies from massive news large data streams,to facilitate the reader to read the event summary that reflects the development of the specific emergencies,is becoming an urgent task to be solved.Traditional information processing methods,such as artificial abstracts,retrieval tools,etc.,can not handle the massive flow of information in the event temporal summary task effectively.In view of it,this paper studies and realizes the temporal event summarization using sparse topic mining with semantic regularization based on the research of the related technology of summary and the achievements of the laboratory.The system starts from the inherent characteristics of the news data set,chooses the abstract algorithm suitable for dealing with the massive news reporting flow,and then takes into account the feature structure of the text corpus itself,through the regular term constraint from high dimension space to low dimension space.At last,it presents the development of the event timely and quickly to the potential readers,so that readers can quickly and intuitively access to the development of the topic of the event and get the latest developments in the event.The main work of this paper is as follows:· First of all,after the study of candidate sentence sparse problem in mass news corpus,and we propose and implement a novel sparse candidate sentence extraction method.When a sudden outbreak,in a very short period of time,a variety of media from different angles on the different aspects of the incident were overwhelming reports,the user submerged in such as debris flow news reports.In the face of massive news reports,only a small number of important sentences in the report can be used to construct a brief,comprehensive and accurate summary.How to filter out the irrelevant sentences,only retain a highly relevant candidate sentences becomes a task like a needle in a haystack.As the user's query,that is,the event topic description which the user gives is usually very short,but the candidate sentence set is very large.This leads to the serious mismatch problem between the user query keywords and sentences.Therefore,we utilize a variety of search engines,simply integrating a variety of search results to expand the user's query,and then use certain retrieval model to select a highly relevant sparse candidate sentences.· Secondly,faced with the difficulty of candidate topic mining,we design and implement the sparse topic mining framework based on non-negative matrix factorization(NMF)clustering.After getting the set of candidate sentences,it is necessary to use the clustering algorithm to extract the candidate topic.The NMF clustering method we adopt does not require the clustering topic center vector to be orthogonal to each other,and gives a more reasonable explanation to the topic center vector.At the same time,the low order decomposition process of NMF clustering method can guarantee that it can deal with large-scale feature dimension explosion problem.· Next,in the face of the problem of clustering semantic calculation of candidate topic clustering in the face of emergencies,a novel Neighborhood Preserving Semantic Measure(NPS)is designed and implemented in term of restraint item in the process of dimensionality reduction.So that the dimension space after the dimension reduction may retain the intrinsic relationship of the original vector representation of the corpus in the space.· Experiments on the KBA corpus show that the algorithm proposed in this paper has achieved significant improvement in the main indicators such as Expected Gain,Comprehensiveness and F measure,which can effectively improve the performance of temporal summarization system.At the same time,we use the prototype system developed according to the algorithm of this paper to participate in the Temporal Summarization Track task of the International Text Retrieval Conference(TREC 2015)and it achieves the second prize in the Summarization Only task.
Keywords/Search Tags:Event Temporal summarization, Sparse Topics, Feature Selection, Matrix Factorization, Neighborhood Preserving Semantic Measure
PDF Full Text Request
Related items