Font Size: a A A

Lmprovement And Application Of Correlation Weighted K-means Algorithm

Posted on:2019-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:B WuFull Text:PDF
GTID:2348330548957927Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology,today's society has evolved into an era of information explosion.A large amount of data information is generated and updated every day.Therefore it is particularly important to extract useful information and knowledge from these vast amounts of data.However,the scale of the data far exceeds the ability of traditional methods to analyze and process the data,resulting in the phenomenon of "data explosion but lack of knowledge".The fast and accurate extraction of the hidden information in massive data has always been a hot research topic of many researchers.Clustering analysis plays an indispensable role in the mining and extraction of unlabeled problems.Cluster analysis has been effectively applied in many fields.Such as virus intrusion detection,statistical analysis,image processing and more.K-means algorithm as one of the ten classic data mining algorithms,it is an iterative algorithm that uses alternating minimization to solve non-convex optimization problems.The algorithm is simple and easy to understand and has high operation efficiency.As an unsupervised clustering algorithm,it has been researched and improved by many researchers in history,such as Forgey,McQueen and so on.The algorithm has been widely used in many fields.But there are still many problems are not solved.For example,the selection of the initial center point,determining the number of data set categories,the similarity between sample objects and other issues.Therefore,in order to improve the stability of the algorithm in the clustering process and the correlation between objects and other issues.In this paper,the initialization of the algorithm,the determination of the number of cluster centers and the distance function are the main research objects,and the corresponding improvement methods are proposed.At the same time,the improved algorithm will analyze the stock price.The specific work is as follows:Selecting cluster center sensitivity during the process of randomly selecting the initial center point for the K-means algorithm.In this paper,we combine max-min initialization method with density and propose a new initialization method.In this paper,the initial point density and interclass distance are used to determine the threshold,and then the sample object is divided.And obtain stable,unique,approximate initial cluster center that is truly distributed.Experiments show that the initial cluster center is better.Pre-determining the number of clusters as a major difficulty in K-means algorithm.To solve this problem,this paper improves on the Rt-kmeans algorithm to obtain a K-means algorithm thatautomatically determines the number of clusters.Compared with Rt-kmeans algorithm,the proposed method can more accurately determine the number of data set.Traditional K-means algorithm is difficult to reflect the inter-object correlation problem.In this paper,the Pearson correlation coefficient was used to weight the Euclidean distance,thus enhancing the correlation between sample objects in the cluster.Compared with the traditional Kmeans algorithm and some novel K-means algorithms,the experimental results show that the paper improves K-means algorithm accurate results is better.The analysis of stock data is mainly divided into two parts: one is the combination analysis of multiple stocks,and the second is the research on the possibility of volatility of stock price of a single stock.The results show that the algorithm can classify the stocks with strong correlation in the same class.
Keywords/Search Tags:cluster analysis, K-means algorithm, automatic division, Pearson correlation coefficient
PDF Full Text Request
Related items