The Study Of High Dimensional Data Clustering Method Based On Graph

Posted on:2018-07-24

Degree:Doctor

Type:Dissertation

Country:China

Candidate:N Du

Full Text:PDF

GTID:1318330542453300

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Cluster problem is the process of grouping a set of data objects.It plays an important role in the field of data visualization,knowledge representation and data mining.The statisticians and machine learning researchers have attracted much attention in this problem.From a statistical point of view,cluster problem is one way to simplify the data through modeling,which aims to organize a collection of data items into clusters,such that items within a cluster are more similar to each other than they are to items in the other clusters.The methods of clustering are mainly divided into two kinds:semi-supervised clustering and unsupervised clustering.Semi-supervised clustering methods complete the grouping data in conjunction with a small amount of labeled data.In contrast to semi-supervised clustering,unsupervised clustering methods choose a proper set of features according to different criteria.And these features help in merging the group of the same property and identifying the group of different properties.This article focuses on the high dimensional semi-supervised and unsupervised clustering problem,which can be resolved by the undirected graph and the additive distance.In this study,we aim to describe and discuss on the similarity measurement,basic concepts,the convergence of algorithms for the text data analysis and image processing.And then,a complete set of theoretical proof and clustering algorithms are proposed to detect for keywords and segment images.The experimental design is based on challenging distinct datasets,and the results demonstrated that no matter in accuracy or calculating speed,our methods perform better than the state-of-the-art.Now we summarize the main contributions of this paper:Ⅰ.A unified framework is proposed to solve the problem of image segmentation based on weight graph clustering method.We introduce the information of neighborhood pixels of label data,which can allow our algorithm to start with only a few seeds,and complete to cluster pixels efficiently.Ⅱ.We prove the additive distances for the discrete case have a fixed form if function is a non-zero polynomial homomorphic mapping.Ⅲ.Based on a tree-structure graphical model,we propose a novel hierarchical clustering method.We also prove the probably approximately correct(PAC)property of this clustering method and discuss the relationship of recovering the true local structure and sample size.

Keywords/Search Tags:

Random walks, Latent tree model, Additive distances, Variable clustering

PDF Full Text Request

Related items

1	Using graphs and random walks for discovering latent semantic relationships in text
2	Two-way latent variable clustering
3	Research Of Related Image Segmentation Algorithms Based On Random Walks
4	Random Walks Based Image Segmentation Method
5	Learning Method Of Hidden Variables Model Based On Structure
6	Latent Variable Modeling and Statistical Learning
7	Modeling And Optimization Of Latent Variable Model
8	Learning And Extension Of Mixed Latent Tree Models
9	Study On Image Retrieval With Relevance Feedback Based On Improved Random Walks
10	Research On Trajectory Partitioning And Clustering Technology Based On Node Movement Features