Research On Key Technique Of Mixed Data Clustering Based On Sparse Representation

Posted on:2019-07-22

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X C Shao

Full Text:PDF

GTID:1318330548462200

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

Data mining has become one of the most important tools for supporting management and decision making.With the expanded application of data mining,the pending data appears to be mixed attribute data,rather than single numerical data or categorical data.Data mining techniques for the corresponding data is always be a hot issue in which data clustering plays an important role.The existing traditional clustering algorithms are always designed for the data objects of numerical or categorical attributes.But many research show that existing data are mostly described by both numerical and categorical attributes which leads to the fact that most traditional clustering methods are not appropriate for processing mixed attribute data.So designing algorithms of high efficiency for data with both numerical and categorical attributes is one of the most attractive research issues in clustering analysis.This dissertation mainly focus on data clustering for mixed attribute data and corresponding methods based on sparse representation,which includes three main aspects:(1)A missing value of mixed data imputation based on sparse representation is proposed for unlabeled mixed data imputation.This proposed method introduces locality constrained linear coding and sparse representation to the process of K-nearest neighbor for dictionary constructing which maintains local structure better and solves the difficulty of choosing similar objects.After deploying this new algorithm on six real datasets,the result shows advantages of data imputation with high efficiency.(2)A spectral clustering method based on K-SVD is proposed to deal with difficulties in calculation for similarity of mixed attribute data.This method draws dictionary learning process of sparse representation theory into spectral clustering and generates coefficient matrix with discriminant information as input weight matrix for spectral clustering which can overcome the disadvantage of calculation and take advantage of high efficiency of spectral clustering.At last,our proposed novel algorithm is deployed on five real datasets and demonstrates superiority in clustering accuracy.(3)A novel algorithm is proposed for automatically determining cluster center to generates better initial cluster center.This method brings the concept of data density into estimating the coherence of data objects and then introduces the distance measurement to select the initial cluster center.This procedure can avoid the fact that centers initialized by random method result in poor outcome of clustering.The performance of our proposed method cooperated with spectral clustering method based on K-SVD is revealed by several experiments on some real datasets in comparison with other clustering techniques.

Keywords/Search Tags:

Sparse Representation, Clustering, Mixed Attribute Data, Missing Value Imputation, Cluster Center Initialization

PDF Full Text Request

Related items

1	Research On Mixed Attribute Clustering Technology Based On Cluster Center Selection Strategy
2	Attribute Associated Neuron Modeling And Missing Value Imputation Based On Neural Network
3	Research On Missing Value Imputation Method Based On Mixed Information System
4	Studies On Missing Data Imputation
5	Research On Data Imputation Methods Of Mixed Missing Type
6	The Online Imputation Method Of Missing Value Based On KNN And Its Application In Credit Evaluation
7	Attribute Correlation Modeling And Missing Value Imputation Of Incomplete Data Based On Fuzzy Partition
8	Research On Clustering Algorithms For The Data With Multidimensional Mixed Attributes
9	Clustering Algorithm Of Missing Data Based On Dissimilarity Measure
10	Comparative Study On Imputation Methods Of Missing Data In XGBOOST Model Under Complete Random Missing Mechanism