Font Size: a A A

A Study Of Detecting News Events With External Knowledge

Posted on:2021-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y HongFull Text:PDF
GTID:2518306503973889Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and “We Media”,the scale and scope of online news have witnessed an extreme extension.Detecting news events automatically and accurately is currently an open problem worthy of research.News event detection is not only vital for finding news events accurately with short delay but is also a basis for tasks such as analyzing public sentiment and opensource intelligence.Many event detection technologies focus on statistics of corpus while neglecting semantics such as entity information,synonymy and polysemy,which leads to risks of over-fitting and lack of interpretability.To solve these problems,this paper studies news event detection techniques enhanced by external knowledge,which utilizes knowledge enhanced topic model for text feature extraction.This technique follows a procedure of topic modeling,text clustering,similar event detection and separation,and cross-clustering event merging.Based on service-oriented programming paradigm,we design and implement a news event detecting and analyzing system supporting stream processing,which implements knowledge-enhanced news event detection techniques and event analyzing functions like news trend analysis,word cloud generation and comment extraction.In specific,this paper contributes on the following 3 aspects:(1)We present and implement two knowledge enhanced topic models.To address the problem of Latent Dirichlet Allocation(LDA),which focus merely on statistics of the corpus,this paper presents Token-level Semantic Enhanced LDA(TSE-LDA).This model uses knowledge bases/graphs as external knowledge.It utilizes entity linking to bridge the gap between word mentions and knowledge base entities.A structure called “semantic unit” is designed to provide structure support for using semantic knowledge of synonymy and polysemy.To further improve TSELDA,we incorporate pretrained word vectors into this model.This improved version of TSE-LDA is called WVTSE-LDA(Pretrained Word Vectors and Token-Level Semantic Enhanced LDA).Pretrained word vectors can not only provide latent features for topic models but can also incorporate word correlations into current models.Evaluation results show that the two topic models proposed by this paper can reach and exceed state of the art baseline models enrolled in the test.(2)We present and implement a news event detection method based on knowledge enhancedtopic models.In this paper,we conduct news event detection by a procedure containing document feature extraction,batch event detection and global event detection.We utilize WVTSE-LDA for document feature extraction.To set the latent topic number automatically,we use a method based on topic stability.Batch event detection recognize potential events by text clustering.We address this problem by affinity propagation.To further detect similar events in a single cluster,we propose a method based on the time and participants.In global event detection,clusters generated in batch event detection stage are merged with global clusters,which is conducted by the cross-clustering event merging method proposed by this paper.Evaluation results show that all the techniques proposed by this paper can meet the functional requirements of news event detection.(3)We design and implement a news event detecting and analyzing system supporting streamprocessing.To implement the knowledge enhanced topic model and the news event detection method based on it,we design and implement a news event detecting and analyzing system supporting stream processing.The system supports stream processing by utilizing Apache Flink framework.We also design and implement an event analyzing service for human users,which provides functions like news trend analysis,word cloud generation and comment extraction.
Keywords/Search Tags:news event detection, topic model, document feature extraction, semantic knowledge, pretrained word vectors
PDF Full Text Request
Related items