Font Size: a A A

News Topic Detection Based On LDA Fusion Model And Multi-layer Clustering

Posted on:2018-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:X D XieFull Text:PDF
GTID:2348330542481364Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the acquisition of information is no longer a constraint on people's development,but how to get the information you want from the explosive data becomes a problem that people need to think about.Nowadays,view news reports occupy a higher and higher proportion in people's network activities,so the paper takes news reports as the breakthrough point and proposes a new method for news topic detection based on LDA fusion model and multi-layer clustering by improving the text model and clustering algorithm.The main work includes the following aspects:First,vector space model based on TF-IDF is a powerful tool for text representation.However,this approach ignores the relevance of text semantics because it is based on the statistics.In this paper,the LDA topic model is introduced into the field of news text,and we obtain potential text semantic knowledge by the theme-text model.Finally,combining the traditional TF-IDF vector space and LDA theme model,merging the statistical-based method and the semantic-based method into one to achieve the aim of improving the quality of text clustering.Second,the clustering effect of text hierarchical clustering algorithm is effective,but the O(n~2)time complexity and ultra-high memory consumption restrict the algorith-m.The traditional single-pass algorithm has high clustering efficiency,but its accuracy is not enough.First of all,this paper improves the single-pass clustering algorithm and preliminarily classifies the news text,which provides the conditions for the next hierar-chical clustering.After the initial clustering,the high aggregation and low granularity topic cluster can meet the requirement that hierarchical clustering can not be applied to a large number of texts.Then,clustering the topic by hierarchical clustering to improve the clustering effect of three aspects:precision,recall rate and F-Measure.The experimental results show that the topic detection method is feasible and can improve the clustering effect to a certain extent.
Keywords/Search Tags:Topic Detection, LDA Model, Single-Pass Clustering, Hierarchical Clustering
PDF Full Text Request
Related items