Composition Of Graph Of Local Density Trend And Its Applications

Posted on:2024-07-29

Degree:Master

Type:Thesis

Country:China

Candidate:J C Duan

Full Text:PDF

GTID:2568307094981619

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Unsupervised machine learning is one of the widely used and more developed fields in modern computer science and technology.Its goal is to discover patterns,structures and regularities in unlabeled data in order to help people better understand the data and discover hidden information in the data.One of the clustering algorithms is faced with problems such as parameter selection,similarity measures,and being influenced by the characteristics of the data distribution,as well as the need for algorithms of high complexity in terms of speed enhancement in the context of big data.This paper addresses the above issues by conducting an in-depth study on how to represent and apply hidden features in samples through data structures,with the main contents including:The research objectives of this paper are divided into the following three main areas:(1)A composition method based on local density trends.Firstly,a confluence tree is defined to depict the density trend of the data nodes.Secondly,to realize the data structure of confluence trees,a method for constructing branches based on decreasing path length is presented,where each branch reflects the local density trend.Based on this,a simple sparse graph of local density(GLDT)construction method based on confluence trees is designed using the idea of hierarchical iteration.Finally,the correctness of the method is verified by theoretical proof.(2)A clustering analysis method based on GLDT composition is given.Firstly,the density factor of a confluence tree is given according to its path length,and the smaller it is,the denser the confluence tree is;secondly,a connection method based on breadth-first search of density-nearest neighbor is given,and the connected confluence tree is required to satisfy a value less than a threshold,while the mean value associated with each connected sub graph is determined by experiment and experience;based on this,a clustering method based on GLDT composition is designed,using synthetically data and UCI data sets with different characteristics,the correctness of this clustering method is experimentally verified.This algorithm is parameter insensitive and has advantages in clustering for non-convex data compared to similar algorithms.(3)A hierarchical sampling method based on GLDT composition is given.For each connected sub graph,the root node of its tree structure is regarded as the local maximum density point of the sub graph,so we only sample this root node in the sampling process.By recursively performing the composition operation,sampling only the root node at a time,we can obtain high density points,and the final sampling ratio can safely reach 1%.By combining this method with spectral clustering,the time efficiency is improved significantly and the clustering accuracy of spectral clustering is improved as well.

Keywords/Search Tags:

Clustering, Density based clustering, Unsupervised machine learning, Graph composition, Density estimation, Sampling

PDF Full Text Request

Related items

1	Research On Density Based Clustering Algorithms For Varying Density Data
2	Research And Improvement On Density-Based Clustering Algorithm
3	Research On Density-based Hierarchical Clustering Algorithm
4	Research On Quick Clustering Algorithm Based On Density Subgraph
5	Research And Application Of Density Peak Clustering Algorithm Based On Density Decay Graph
6	The Research And Improvement Of Density-based Clustering Algorithm
7	Research On The Robust And Adaptive Switching C-Regressions Models Based On Cluster Analysis
8	Research On Clustering Algorithm Based On Density Analysis
9	Research On Density Peaks Clustering
10	Study On Clustering For Large Data Sets And Its Applications