Font Size: a A A

News Threading Based On LDA Model

Posted on:2013-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z H YanFull Text:PDF
GTID:2218330362459272Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Thanks to the rapid development of information technology, news spread through the Internet has become the main channel for people to obtain information. However, the rapid development of the Internet has also brought the problems of information overload, lacking of structural information. It is difficult for people to figure out different aspects of a news event from a large number of new reports efficiently. In order to solve this problem, we propose using news threads to illustrate different aspects of news, using a single word or phrase as thread label to help people more convenient understanding news events.In this paper, we first present definitions for news thread and news label. Based on this, two methods of news threading are proposed. The first method is based on the posterior probability of LDA. It first selects thread words by adjusting LDA results, then extract phrases based on thread words. The second method is a topical N-gram model with a background distribution. In this model, each news reports is represented as a combination of a background distribution over the corpus and a mixture distribution over hidden news threads.To evaluate method proposed, we take experiments on Chinese and English news corpus respectively. Artificial evaluation shows that, methods generate representative thread labels, which can help people understand news event. We further applied methods to a news corpus system.
Keywords/Search Tags:topic model, news threading, news label
PDF Full Text Request
Related items