Font Size: a A A

The Study Of High Dimensional Data Clustering Method Based On Graph

Posted on:2018-07-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:N DuFull Text:PDF
GTID:1318330542453300Subject:Statistics
Abstract/Summary:PDF Full Text Request
Cluster problem is the process of grouping a set of data objects.It plays an important role in the field of data visualization,knowledge representation and data mining.The statisticians and machine learning researchers have attracted much attention in this problem.From a statistical point of view,cluster problem is one way to simplify the data through modeling,which aims to organize a collection of data items into clusters,such that items within a cluster are more similar to each other than they are to items in the other clusters.The methods of clustering are mainly divided into two kinds:semi-supervised clustering and unsupervised clustering.Semi-supervised clustering methods complete the grouping data in conjunction with a small amount of labeled data.In contrast to semi-supervised clustering,unsupervised clustering methods choose a proper set of features according to different criteria.And these features help in merging the group of the same property and identifying the group of different properties.This article focuses on the high dimensional semi-supervised and unsupervised clustering problem,which can be resolved by the undirected graph and the additive distance.In this study,we aim to describe and discuss on the similarity measurement,basic concepts,the convergence of algorithms for the text data analysis and image processing.And then,a complete set of theoretical proof and clustering algorithms are proposed to detect for keywords and segment images.The experimental design is based on challenging distinct datasets,and the results demonstrated that no matter in accuracy or calculating speed,our methods perform better than the state-of-the-art.Now we summarize the main contributions of this paper:?.A unified framework is proposed to solve the problem of image segmentation based on weight graph clustering method.We introduce the information of neighborhood pixels of label data,which can allow our algorithm to start with only a few seeds,and complete to cluster pixels efficiently.?.We prove the additive distances for the discrete case have a fixed form if function is a non-zero polynomial homomorphic mapping.?.Based on a tree-structure graphical model,we propose a novel hierarchical clustering method.We also prove the probably approximately correct(PAC)property of this clustering method and discuss the relationship of recovering the true local structure and sample size.
Keywords/Search Tags:Random walks, Latent tree model, Additive distances, Variable clustering
PDF Full Text Request
Related items