Font Size: a A A

Research On Clustering Algorithm For Mixed Attributes And Application

Posted on:2018-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhaoFull Text:PDF
GTID:2348330533463693Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering algorithms is the algorithms which divide the dataset into clusters.And it refers to the idea of "birds of a feather flock together ".The Algorithms are to make the similarity between samples in the same cluster as large as possible while the similarity between samples from different clusters as small as possible.Clustering algorithm has been widely used in many areas like engineering,economics,social sciences and so on.The state of the art of clustering algorithms can only deal with numerical attributes or categorical attributes and cannot meet the needs of practical data mining.Although the clustering algorithm for dealing with mixed attributes has been studied,there are the following problems in the state-of-art algorithms: the similarity measurement between mixed attributes is unreasonable.In addition,redundant information between mixed attributes has not been studied.Based on analysis of the existing clustering algorithm,this paper proposes a mixed attribute clustering algorithm based on feature comparison,a mixed attribute clustering algorithm based on principal component analysis,moreover.And experiments with the UCI dataset verify the effectiveness of the algorithm finally.The main contents of the paper are as follows:Firstly,the significance of clustering analysis,the research status and the five dimensions of clustering analysis are introduced.The existing clustering algorithms are classified from the perspective of similarity measure and algorithm in further step.Secondly,we studied the shortcomings of the existing mixed attribute clustering algorithms.To overcome the problem that existing mixed attribute clustering algorithm cannot deal with the multi-concepts for the existing mixed attribute similarity measure method.Based on the proposed feature comparison function,mixed attribute similarity measure method,combined with the idea of partition design,is proposed.Experiments are carried out to verify the feasibility of the algorithm.Then,to deal with the problem of dataset redundancy in the mixed attribute data set,the method of calculating the correlation property between categorical attribute and numerical attribute is proposed after analyzed synthetically the existing correlation calculation method.By constructing the mixed attribute correlation matrix,the principal component analysis is carried out,and an improved mixed attribute clustering algorithm based on principal component analysis is proposed and the correctness of the algorithm is verified by experiments.Finally,the improved mixed attribute clustering algorithm is applied in practice.In order to solve the problem of incomplete evaluation due to the non-mixed attribute problem in the customer segmentation process,the customer evaluation system is reestablished and the mixed attribute is filtered,then the mixed attribute clustering algorithm is applied to segment the customer.And different marketing strategies are made for different customer groups.
Keywords/Search Tags:mixed attribute, cluster analysis, similarity measure, correlation, customer segmentation
PDF Full Text Request
Related items