Font Size: a A A

News Topic Discovery Research Based On The LDA Model

Posted on:2015-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:X S MaFull Text:PDF
GTID:2298330431483611Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and the popularization ofcontemporary, placing ourselves in the ocean of information, the face of the increasingly rapidexpansion of network information, there are many similarities and correlations hot topic, howto locate from such a chaotic mass of information in the news information of interest to thetarget, is an important aspect of news hotspot. Text clustering is the most basic technology ofthis research. Found that there is currently a hot topic in the news two basic questions: first,how to improve the clustering effect, as far as possible the same news gathered a class; secondis how to express the clustering results. According to the above two questions, the authorproposes a hot news topic clustering algorithm to find the hot news.This paper includes the following three aspects:First, the key technology in the field of topic found at home and abroad, such asmodeling, feature extraction, text, text clustering methods, statistical subject model andcluster theme identification methods are analyzed in detail, and summarizes the advantagesand disadvantages of the key technologies and the current research progress.Second, the introduction of text clustering model LDA topic areas, the use of statisticallygenerated text Potential topic model, the underlying theme features knowledge into wordspace, and a combination of fuzzy clustering, dig deep inside text semantic knowledge,improve the quality of text clustering.Third, the use of potential topics LDA generated model and features characteristic wordword set, combined with the probability distribution of the text proposed clustered themesrecognition method based LDA model to enhance visualization and clustering resultscomprehensibility.Experimental results on the Chinese corpus comparative analysis shows that theproposed method is superior to the traditional word spatial clustering algorithm, clusteringquality are improved by3%to10%range, and the results are clustered themes recognitionmore accurate, which verifies based text LDA model fuzzy clustering method is reasonableand effective.
Keywords/Search Tags:Topic Detection, Latent Dirichlet Allocation Model, Feature Representation, Topic Identification
PDF Full Text Request
Related items