Research On Density Based Clustering Algorithms For Varying Density Data

Posted on:2024-09-06

Degree:Master

Type:Thesis

Country:China

Candidate:J L Wu

Full Text:PDF

GTID:2568307079463964

Subject:Computer Science and Technology

Abstract/Summary:

In the era of big data,the ever-increasing amount of data means that people need more efficient data mining techniques.Traditional analysis methods are no longer able to meet the needs of analyzing such large and complex datasets,but clustering analysis can identify information among them and help people make more informed decisions.Among the many clustering algorithms,density-based clustering algorithms stand out for many advantages.Compared to other clustering algorithms,it can handle clusters of different sizes and shapes,especially non-convex clusters that other clustering algorithms have difficulty handling.It is also robust to noise and outliers in the dataset.Additionally,it does not require prior knowledge to determine the number of clusters in the dataset,making it more suitable for unsupervised learning tasks.However,density-based clustering algorithms also have two problems.First,most density-based clustering algorithms use a single global density threshold to divide highdensity and low-density areas.This approach does not consider the impact of local density changes,and thus performs poorly in datasets with changing densities.Second,most density-based clustering algorithms determine clusters by expanding from high-density areas from top to bottom.This approach completely ignores the information brought by noise points in low-density areas,leading to wasted information and affecting the final clustering results.To address these issues,this thesis proposes two different algorithms,both of which can solve the above problems to a certain extent,achieving performance.(1)Density Incremental-based Clustering Algorithm: It models the dataset as a flowing field,where each sample point moves towards the high-density direction.In this case,the algorithm can discover density changes regardless of how the relative values of region densities change in datasets with changing densities.Compared to other density gradientbased clustering algorithms,this algorithm can discriminate noise points in low-density areas without calculating the complete movement trajectory of sample points.(2)Density Ratio-based Clustering Algorithm: It determines possible noise points in a local area by calculating the ratio of sample point density to the maximum local density,and calculates the average of sample point density and global density to determine the density change in the local area compared to the global data.This balances the density imbalance between dense and sparse areas in datasets with changing densities,achieving better clustering results.After determining the noise points,both algorithms implement denoising operations on them.The algorithm divides the low-density area from bottom to top based on the information provided by the denoising process,allowing the algorithm to effectively cluster data in high noise datasets.The thesis conducted experiments on 8 commonly used synthetic datasets and 7 commonly used real-world datasets,and the results proved that the proposed algorithms can effectively handle datasets with changing densities and high noise.

Keywords/Search Tags:

Machine Learning, Data Mining, Unsupervised Learning, Clustering Analysis, Density-Based Clustering

Related items

1	Research On Dynamic Measurement Based Data Stream Clustering And Its Applications
2	Research On Clustering Algorithms For Complex Structured Data
3	Semi Supervised Clustering Algorithm And Its Application And Research
4	Robust Tensor Clustering For High-dimensional Data
5	Composition Of Graph Of Local Density Trend And Its Applications
6	Research On Unsupervised Clustering Algorithm And Applications On Series Data Analysis
7	On Learning Classifier System Clustering And Backbone Extraction Methods Under Unsupervised Learning Framework
8	Research On Clustering Methods Based On Unsupervised Deep Feature Learning
9	The Research On The Method To Measure The Validity And To Abstract Knowledge Of Clustering
10	Research On Clustering Algorithms Based On Metric Learning For Complex Data