Font Size: a A A

An Approach For Generating News Clues Based On Biterm Topic Model

Posted on:2022-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:T Z ZhaoFull Text:PDF
GTID:2518306335497614Subject:Journalism and Media
Abstract/Summary:PDF Full Text Request
With the rapid development and popularization of news media platforms on the Internet,news data are growing explosively.By analyzing and mining news data,extracting news clues from news events facilitates to understand the context and evolution direction of news events quickly and accurately.Therefore,studying the evolution and tracking technology of news events can lay a good foundation for the field of news communication,and it also has certain research significance for the development of data compression,user interest discovery,public opinion tracking and other fields.And extracting news clues from news data also has a good practical significance to people's lives.Topic model has been widely used to extract and analyze the implicit information in news texts,and implement the evolution tracking of news events via implicit information.Since the news on Web contains both long text and short text news,traditional topic models are difficult to guarantee the effectiveness of topic extraction from these two types of news simultaneously and the quality of news clues.In addition,news data continues to flow,and the news is released on various media platforms every day.Therefore,how to solve the problem of ineffective topic extraction in long text and short text news,extract news data incrementally and generate easy-to-understand news clues have become the main problems to be solved in this thesis.This thesis studies the generation of news clues,the main research content includes the following four aspects:1.We propose a News-IBTM model based on IBTM(Incremental Biterm Topic Model)by reducing the extraction scope of binary phrases for topic extraction from both long text and short text news.2.We introduce an incremental Gibbs sampling algorithm to incrementally estimate the topic and topic-word distribution from news data based on News-IBTM model.3.We propose a clues generation method for news events.Using the topic and topic-word distribution estimated by incremental Gibbs sampling,we infer the document-topic distribution,and then use JS(Jensen-Shannon)divergence to measure the document-topic distribution.Besides,the experimental results on People's Daily Online News and Weibo News show that News-IBTM outperforms state-of-the-art models in perplexity,accuracy and efficiency for both long text and short text news.4.We design and develop a Web version of the news clues generation prototype system based on the news clues generation method proposed in this thesis.The system has the functions of files loading,Chinese word segmentation,topic modeling,theme display,topic distribution display and news clues visualization.And it has the characteristics of cross-terminal and instant update.
Keywords/Search Tags:News events, Topic model, Topic extraction, News clues, Jensen-Shannon divergence
PDF Full Text Request
Related items