Font Size: a A A

Three-way K-means Clustering Algorithm And Its Application

Posted on:2024-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:X TangFull Text:PDF
GTID:2568307073976589Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
For a long time,people usually use data mining technology to identify and filter data,furthermore,cluster analysis is an important and efficient data processing in the field of data mining,which can help people find out the correlation between data and discover potential value,thereby digging useful information more efficiently inside the data.In common cluster analysis methods,there is a very clear definition of data division between objects and clusters.That is to say,the object is required to either belong to or not belong to a certain type of cluster.However,due to cognitive deviation,people in real life usually cannot accurately obtain all the information of the data,which makes errors in clustering and affects the final clustering effect.Based on this,the proposal of the three-way clustering idea solves this problem well.By using the core region,the boundary region and the trivial region to represent each clustering category,it can effectively deal with the clustering problem of these objects with uncertain information.To overcome the problem of K-means clustering algorithm that it selects the clustering center randomly and is easy to fall into the local optimum problem,this thesis proposes a three-way K-means clustering algorithm combining neighborhood density、farthest Euclidean distance and the three-way decision clustering thought.The main work of the article is summarized as follows:(1)Propose a three-way K-means clustering algorithm in this thesis.Firstly,this algorithm fully considers that the shortcomings of K-means algorithm randomly select the cluster center and is easily affected by extreme values,from the perspective of the relationship between data objects,then the method of neighborhood density and the farthest Euclidean distance are used to select the k value,and the two-way clustering result is obtained.Secondly,combined with the idea of three-way clustering method,the nearest neighbor is introduced to calculate the distance between the data objects in the two-way clustering.In this way,the core region and boundary region of the three-way clustering is obtained.Finally,with the help of five UCI datasets(Iris,New-thyroid,Mammographic,Seeds,Bupa)and five artificially simulated datasets(400-4K2,Aggregation,D9,Pathbased,R15),this paper compares the three-way K-means clustering algorithm with other three algorithms by evaluating clustering effectiveness metrics Davies-Bouldin-Index(DBI)and Accuracy(ACC),the conclusions are as follows: from the point of view of the DBI indicator,The algorithm of this article has smaller values in the UCI dataset(Iris,Bupa)and artificial simulation datasets(Aggregation,Pathbased,R15),and the effect is better.From the point of view of the ACC indicator,whether it is average value or best value,the algorithm of this article has shown a high accuracy rate on the datasets.(2)A three-way K-means clustering algorithm is actually applied to customer segmentation.This thesis divides “Global Superstore” data with the help of RFM model,and then the algorithm of this article is used to classify customers into three types,namely high-value,potential-value and low-value customer groups.Then,this thesis proposes some different marketing strategies for three type customers,which can ensure that keeping high-value customers and at the same time,this part of potential customers can also be seized.The three-way K-means algorithm proves that when using three-way clustering algorithm in the process of subdividing the RFM model,we can better divide customer groups according to the actual situation 、 mine the index characteristics of subdivided customer groups、and formulate appropriate marketing plans.That is to say,in some practical applications,adding boundary region is a good way.This thesis proves that it is a good method to reduce the customer churn rate and improve the management ability of the enterprise,which has strong practical significance.
Keywords/Search Tags:K-means clustering, Three-way clustering, customer segmentation, RFM model
PDF Full Text Request
Related items