Font Size: a A A

Topic Clustering Technique And Its Application

Posted on:2015-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:H P ZhangFull Text:PDF
GTID:2298330467963933Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent decades, the Internet industry flourishes, we have so much networks information. Covering many of events, involving a wide level of access, Internet news has become the main channel to receive news. While even for the same topic, different media has different view. So establishment of the topics-based information organization model, a comprehensive understanding of causes and development of the events has become the research hotspot.For the unique characteristics of Network news reports, such as long duration, covering many different views and involving a lot of things, the paper main study how to build a topic detection model for network news reports, which can discover new topics efficiently and accurately and make a clustering. The main work and innovation in this paper are as follows:Firstly, the traditional spectral clustering algorithms require human input scale parameter. I introduced an improved adaptive spectral clustering algorithm which can determine the scale parameter based on the sample space automatically. In the classic spectral clustering algorithm, it requires to input the scale parameter artificially when calculating the affinity matrix. So the adaptive spectral clustering algorithm reduces the limitations of artificial and we got good results using point sets.Secondly, the improved spectral clustering algorithm is applied to the topic clustering. Experimental analysis the effect of the improved spectral clustering algorithm on topic clustering, validate the improved spectral clustering algorithm has the same effective to text clustering. Thirdly, I proposed a two-level clustering strategy based on incremental clustering algorithm, adding the concept of sub-topics. Time factor is added to the clustering process, and I proposed pre-clustering based on time. The clustering largely reduces the computational complexity and improves efficiency.Through these studies, I improved spectral clustering algorithm, and introduced a secondary topic clustering strategy. It has some reference value for enhancing the effect of clustering topic.
Keywords/Search Tags:topic detection, spectral clustering, scaleparameter, two-level clustering, network news report
PDF Full Text Request
Related items