I-nice: A New Approach For Data Clustering

Posted on:2019-01-16

Degree:Doctor

Type:Dissertation

Institution:University

Candidate:Md Abdul Masud

Full Text:PDF

GTID:1368330599454822

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Clustering is a general data exploration method.The k-means type algorithms of clustering methods are efficient for large data.However,these algorithms require the number of clusters in advance,and need to specify the initial cluster centers for improving the clustering results.In this thesis,we propose I-nice,the abbreviation of Identifying the number of clusters and initial cluster centers,approach for data clustering.In I-nice approach,we consider a dataset as a terrain in which clusters are hills.We assign an observer to the terrain to observe and count the peaks of hills,which correspond to the dense regions of clusters and reflect the number of clusters in the data.We develop two parameter-free clustering algorithms based on the I-nice concept.Using the I-nice method,we solve three clustering problems.The major contributions of this thesis are fourfold.First,we propose the I-nice approach for data clustering.The I-nice method transforms high-dimensional data into one-dimension distance data by computing the distances between the observation point and the objects.The distance distribution is modeled by a set of Gamma mixture models,which are solved with the expectation-maximization algorithm.The best-fitted model is selected with an Akaike information criterion variant.We propose the I-nice SO(I-nice with a Single Observation)algorithm in which the number of components in the model is taken as the number of clusters,and the objects in each component are analyzed with the k-nearest neighbor method to find the initial cluster centers.For complex data with many clusters,we propose the I-nice MO(I-nice with Multiple Observations)algorithm,which combines the results of multiple observation points.Second,we formulate the I-nice based semi-supervised clustering from unlabeled data.In I-nice based semi-supervised clustering,we propose a method for selecting pairwise constraints from unlabeled data for improving the clustering accuracy.For this purpose,we first cluster the unlabeled data with the I-nice method into a set of initial clusters.The most informative objects and informative objects are identified from objects in clusters to form a set of pairwise constraints.The advantage of this method is that no label information of data is required for selecting the pairwise constraints.Third,we also formulate the I-nice based concept drift detection for cluster survival analysis.In this approach,we propose a data stream clustering algorithm I-nice Stream for clustering the unlabeled load profile data stream.The concept drift detection method uses a modified Kullback-Leibler divergence to compute the concept drift scores from the clustering results.We estimate the clustering patterns from the concept drift scores.We use the survival analysis to categorize the clustering patterns into sustaining,fading,and emerging types;and retrieve the representative load profiles with interesting characteristics.Finally,to analyze load profile data stream,we propose the I-nice based semi-supervised clustering ensemble framework.We modify the algorithm I-nice MO with weighted observation points,namely,I-nice WMO,which discovers the cluster structure on each load profile data horizon.In semi-supervised clustering,the pairwise constraints are selected from each structure of clusters and a set of solutions are obtained from several consecutive data horizons.Then,the clustering ensemble method is formulated for obtaining an optimum clustering solution.In the experiments,we used synthetic datasets,real-world datasets,and real-life application load profile data.The load profile data contains 21330 load profiles collected from manufacturing industries at Guangdong province in China in 2012.The experiments were conducted to evaluate the effectiveness of proposed methods against competing methods in details cluster analysis.

Keywords/Search Tags:

Data Clustering, Number of Clusters, Semi-supervised Clustering, Concept Drift Detection, Load Profile Data

PDF Full Text Request

Related items

1	Research On Semi-supervised Classification Of Data Stream Based On Adaptive Density Clustering
2	Research On Semi-supervised Classification Of Data Stream Based On Clustering
3	Research On Semi-supervised Classification Of Data Streams Based On Clustering And Transfer Learning
4	Research On Semi-supervised Classification Algorithm For Data Stream With Concept Drift
5	Semi Supervised Clustering Algorithm And Its Application And Research
6	Research On Data Stream Concept Drift Detection And Adaptive Clustering Algorithm
7	Learning On Evolving Data Streams
8	Research On Clustering Ensemble And Semi-Supervised Clustering In Data Mining
9	Research On Dynamic Measurement Based Data Stream Clustering And Its Applications
10	Classification Algorithm For Data Streams With Concept Drift And Its Applications