Font Size: a A A

Research On The Technology Of News Topic Detection Based On Hierarchical Clustering

Posted on:2018-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:W L GaoFull Text:PDF
GTID:2348330518987185Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The rapid development of the network has brought us into a era of highly inflated information, It's essential part of people's lives to pay attention to web news and browse online news. How to quickly get news reports of interested events and similar or related events has become a hotspot in natural language processing, it is pushing the news topic event mining and analysis technology to maturity.The topic discovery can get the interrelationship between the topic and understand the topic more meticulously, and contribute to tracking the related event reports and the detection of the new topic. It is a significant part of the process of news subject mining and analysis. On the correlation calculation of topics, usually regards the topic as the feature set to filter and analyze topics, track the follow-up reports of topics and identify new topic by reducing noise, capturing the characteristics of evolution,joining time factors, and other methods. This paper considers the hierarchical relationship between topics, improves the BRT algorithm based on hierarchical clustering, namely normal-LBRT algorithm, which is more suitable for hierarchical clustering of topic-word probability. In a continuous time slice, build the topic hierarchy tree. In this paper, a hierarchical relational extraction algorithm is proposed to obtain the hierarchical structure relation of the topic set, the algorithm uses the parent node as the boundary to measure the edge distance between his two child nodes, and integrate into hierarchy ratio of parent node and root node in the tree structure, so as to correct the partial distance error of KL divergence, and more effectively analyze and detect the news topic.The experiment data was obtained from sogou laboratory's a news report data within continuous period of time that is opened. The results show, The normal-LBRT algorithm is suitable for hierarchical clustering of topic in the form of word probability.Moreover, the hierarchical extraction method in this paper can correctly express the hierarchical relationships among the nodes in the hierarchical tree, and it is sensitive to the distance between the hierarchical relationships. And the experiment also had verified the validity and feasibility of the correlation algorithm that used in this paper.
Keywords/Search Tags:topic discovery, LDA, BRT algorithm
PDF Full Text Request
Related items