Font Size: a A A

Improvement Of K-Means Algorithm And Its Application In Weibo Topic Discovery

Posted on:2019-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y N LiuFull Text:PDF
GTID:2348330545979603Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet,it has become an important research direction for scientists today to extract valuable content from data analysis.In China,as a new social media and information exchange platform,Weibo has great practical significance for data mining and analysis.K-Means algorithm is one of the most popular algorithms used in clustering technology.The multiple deformation algorithms derived from K-Means algorithm are the core technologies in text information mining technology.This paper introduces the analysis of traditional K-Means algorithm has fast clustering speed,easy to implement,and applies to the characteristics of text,image features and other data,found that due to the randomness of the initial center point of clustering,traditional K-The clustering results of Means algorithm and its variants will produce large fluctuations,and the data of Weibo has such characteristics as mass,shortness,non-standard,and large degree of repeatability.The traditional information analysis methods are difficult to meet the needs of microblog information analysis.On this basis,the work done in this paper is as follows: Firstly,based on the defects of the traditional K-Means algorithm,an improved K-Means algorithm based on the concept of density is designed.The data is first sorted by calculating the density value of each data point.After the screening,according to the principle of minimum and maximum,a reasonable choice of k initial cluster centers eliminates the randomness of the original algorithm.Then,based on the characteristics of micro-blog information and processing efficiency,the design of text-based clustering The microblog information processing flow includes text denoising,word segmentation,stop word filtering,text representation,feature extraction,and weight calculation to process the microblog text into a format in which the algorithm can be input;finally,the microblog data is obtained through experiments.After processing,the improved K-Means algorithm,traditional K-Mean algorithm and CAMDP algorithm are applied to topic clustering,and the algorithms are evaluated with accuracy,recall rate,and F1 value.The experimental results show that the improved K-Means algorithm can effectively improve the accuracy of clustering results,and has good clustering effect on Weibo topics,and provides a basis for the in-depth analysis of Weibo information and the development of subsequent application systems.The research work in this paper has important reference significance for the application development of the microblog public opinion monitoring.
Keywords/Search Tags:K-Means algorithm, density center, Minimum-maximum principle, Weibo text clustering
PDF Full Text Request
Related items