Font Size: a A A

Clustering Algorithm For Mixed Type Data And Its Application

Posted on:2014-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2248330395999160Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Until now, most of the existing clustering algorithms have been limited to deal with the data which contains either numerical attributes or categorical attributes. However, a lot of the practical databases and large datasets contain not only numerical data but also categorical ones. It’s necessary to handle both of them at the same time. Thus, it is of great theoretical and practical significance to develop a clustering algorithm which can deal with numerical data and categorical data simultaneously.The main research work of this paper can be summarized as follows:(1) It firstly introduces the unsupervised discretization algorithms and then proposes a new clustering algorithm for mixed type data which is based on unsupervised discretization algorithms. The experimental results on UCI dataset indicate that this new clustering algorithm is very effective for the mixed type data.(2) Introduce the supervised discretization algorithm, CAIM, and on the basis of CAIM, the paper proposes supervised discretization clustering algorithm. The experimental results on UCI mixed dataset show that the proposed algorithm is superior to k-prototypes algorithm. Moreover, for UCI numerical dataset, this algorithm outperforms k-means.(3) Introduce the mass spectrometry-based protein identification and protein inference problem.Then, the clustering algorithms proposed in this paper are applied to solve the protein inference problem. Through running on two proteomics datasets, the inference performance of these clustering algorithms is verified.
Keywords/Search Tags:Mixed data type, Clustering, Discretization, Protein inference
PDF Full Text Request
Related items