Font Size: a A A

Density-based Clustering Algorithm And Its Application Research In News Topic Discovery

Posted on:2017-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LiuFull Text:PDF
GTID:2358330482991351Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the emergence of the new media of the Internet and the progress of the means of information transmission, we have got rid of the limitations of acquisition channel of information and message content numbers, but at the same time brought attendant problems such as overloading information, spreading out of control. People are contacting the magnanimous news topic every day, in which quite part of topics are what the people are not interested. How to find topics accurately and validly is the most urgent task we are facing. The text is the most important information carrier in news topic. So the text clustering analysis is a very basic and key problem in information processing. Among many machine learning methods, cluster analysis is considered as an effective way to quickly and precisely find, locate, organize and analyze the available information with specific purpose. Using clustering analysis to simplify the text data has an important application in the news topic discovery.Based on the research of topics discovery and cluster analysis, combining improved particle swarm optimization algorithm with fast search density peak finding algorithm, we propose PSO-FSDP clustering algorithm and apply it to the news topic discovery. The main research results were as follows:(1) In view of the characteristics of the density peak clustering algorithm, the clustering algorithm based on particle swarm optimization algorithm is proposed.Analyzing fast search density peak finding algorithm, because the disadvantage that it cannot automatically determine clustering centers, the particle swarm optimization algorithm is introduced. Combining the particle swarm algorithm with fast search density peak finding algorithm, PSO-FSDP clustering algorithm is proposed. Firstly the new fitness function is set up, and then the cluster centers are output by the particle swarm optimization algorithm. The experiment results shows that the algorithm could effectively solve the limitations that traditional fast search density peak finding algorithm could not automatically determine the clustering centers, avoiding the subjectivity of artificial selection process, with strong stability and fast convergence speed, obtaining the good clustering effect.(2) In the light of the characteristic of high dimension of text data, the PSO-FSDP clustering algorithm was applied to text clustering, discovering the news topic.In this paper, through the analysis of text feature vector, in the basis of fast search and find of density peaks clustering algorithm, PSO-FSDP is applied in text clustering. Using the similarity between the texts instead of the text point distance, we solve the problem that the original algorithm is not applied to high-dimensional data. It proposes the idea of replacing the text data from the similarity distance, using word2 vec to model text and cosine similarity formula to calculate the similarity between texts, obtaining the density of each text point and the distance of the higher density points. By utilizing PSO-FSDP algorithm, the cluster centers are selected to realize text clustering, discovering the news topic. Compared with other text clustering algorithms, this algorithm has achieved a higher accuracy, recall, F-measure and could get better text clustering results.(3) Based on PSO-FSDP algorithm, a prototype system of news topic discovery is designed and implemented.Based on the analysis of the text clustering process, the corresponding functional modules are designed for each process, and the prototype system of the news topic discovery based on PSO-FSDP algorithm is finally realized. The prototype system could effectively capture, analyze and deal with the news reports in the network, and finally provide the users with intuitive new discovery topics.
Keywords/Search Tags:Clustering analysis, News topic detection, Particle Swarm Optimization
PDF Full Text Request
Related items