Font Size: a A A

Research On Key Technique Of Mixed Data Clustering Based On Sparse Representation

Posted on:2019-07-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:X C ShaoFull Text:PDF
GTID:1318330548462200Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Data mining has become one of the most important tools for supporting management and decision making.With the expanded application of data mining,the pending data appears to be mixed attribute data,rather than single numerical data or categorical data.Data mining techniques for the corresponding data is always be a hot issue in which data clustering plays an important role.The existing traditional clustering algorithms are always designed for the data objects of numerical or categorical attributes.But many research show that existing data are mostly described by both numerical and categorical attributes which leads to the fact that most traditional clustering methods are not appropriate for processing mixed attribute data.So designing algorithms of high efficiency for data with both numerical and categorical attributes is one of the most attractive research issues in clustering analysis.This dissertation mainly focus on data clustering for mixed attribute data and corresponding methods based on sparse representation,which includes three main aspects:(1)A missing value of mixed data imputation based on sparse representation is proposed for unlabeled mixed data imputation.This proposed method introduces locality constrained linear coding and sparse representation to the process of K-nearest neighbor for dictionary constructing which maintains local structure better and solves the difficulty of choosing similar objects.After deploying this new algorithm on six real datasets,the result shows advantages of data imputation with high efficiency.(2)A spectral clustering method based on K-SVD is proposed to deal with difficulties in calculation for similarity of mixed attribute data.This method draws dictionary learning process of sparse representation theory into spectral clustering and generates coefficient matrix with discriminant information as input weight matrix for spectral clustering which can overcome the disadvantage of calculation and take advantage of high efficiency of spectral clustering.At last,our proposed novel algorithm is deployed on five real datasets and demonstrates superiority in clustering accuracy.(3)A novel algorithm is proposed for automatically determining cluster center to generates better initial cluster center.This method brings the concept of data density into estimating the coherence of data objects and then introduces the distance measurement to select the initial cluster center.This procedure can avoid the fact that centers initialized by random method result in poor outcome of clustering.The performance of our proposed method cooperated with spectral clustering method based on K-SVD is revealed by several experiments on some real datasets in comparison with other clustering techniques.
Keywords/Search Tags:Sparse Representation, Clustering, Mixed Attribute Data, Missing Value Imputation, Cluster Center Initialization
PDF Full Text Request
Related items