Research And Improvement Of K-Means And DBSCAN Clustering Algorithms

Posted on:2022-04-25

Degree:Master

Type:Thesis

Country:China

Candidate:S C Cheng

Full Text:PDF

GTID:2518306524998929

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the advent of the 21 st century,Internet applications are becoming more and more popular,resulting in a large amount of information.In many research fields,this information plays a very important role in research progress,and it is necessary to dig out valuable information from massive amounts of information.Information needs to use data mining technology.This article focuses on the clustering algorithm in data mining.As an algorithm for unsupervised learning,the clustering algorithm is a method of finding the clustering structure in a data set.The characteristics of the data set are the same.The maximum similarity within a cluster and the maximum difference between different clusters,each cluster represents a different feature or similarity between data points.Clustering is a basic data analysis tool,so it has a wide range of applications in different scientific fields,and it is especially important in unsupervised learning scenarios.Clustering algorithms can be divided into hierarchical method clustering,partition method clustering,density-based clustering,etc.In practical applications,partition-based clustering algorithms are the most widely studied and applied,such as K-means,K-means++,X-means and other clustering algorithms.Although many improved partition-based clustering algorithms can be seen at present,they inevitably have the following problems: 1、the determination of the number of clusters in the algorithm 2、the selection of the initial clustering center of the algorithm 3、the search of parameters in the algorithm Excellent ability is not good.In response to the above problems,based on the research and analysis of the clustering method and performance evaluation parameters and other related basic knowledge,this paper proposes the following two clustering algorithms:(1)Completely unsupervised K-means based on weighted entropy(2)Edge stripping clustering The main research work of these two clustering algorithms is as follows:(1)The k-means algorithm is an unsupervised clustering algorithm,but the k-means algorithm is always affected by the initialization of the necessary number of clusters in advance.In order to solve the accuracy of clustering and determine the number of initial clusters,this paper proposes A K-means algorithm based on entropy theory(EK-means)is proposed.The algorithm is based on entropy theory.It constructs an information entropy for each data object as the information of each data point,and then combines the membership degree to construct a new Based on the new objective function,an unsupervised learning mode can be constructed for the k-means algorithm.In this learning mode,the k-means algorithm does not need to set the cluster initialization in advance,and can find an optimal cluster cluster in time The number and the time complexity of the Ek-means algorithm are analyzed.Finally,the proposed E-k-means method is compared with other existing clustering algorithms,and the experiment proves the effectiveness of the E-k-means clustering algorithm proposed in this paper.(2)This paper proposes a new non-parametric clustering method based on the DBSCAN(Density-Based Spatial Clustering of Applications with Noise)algorithm,Boundary-stripping(BS).This method is based on the following concept: each potential cluster consists of layers surrounding its core,where the outer layer or boundary points implicitly separate the cluster clusters.Unlike the DBSCAN algorithm,in DBSCAN,the core of the cluster is directly composed of their core.Density definition,where undiscovered core points are revealed by the gradual peeling off of boundary points.Analyzing the density of local neighborhoods can identify boundary points and associate them with inner points.Experiments show that the BS algorithm is adapted to local density and features,and can successfully separate(possibly different densities)adjacent clusters.The algorithm was tested on a large number of labeled data sets,which included high-dimensional data with deep features trained by a convolutional neural network.Experiments show that the method proposed in this paper is more competitive than other latest non-parametric methods when using a fixed parameter set.

Keywords/Search Tags:

Entropy theory, density, non-parametric method, unsupervised learning model, BS

PDF Full Text Request

Related items

1	Research On Unsupervised Meta-learning Classification Algorithm
2	Algorithm Study On Non-Parametric Kernel Density Clustering And Feature Extraction
3	Non-parametric Model-based Image Segmentation Method
4	High-Accuracy Min-Entropy Assessment Method Based On TPA-LSTM Prediction Model
5	Research On Disk Failure Prediction Method Based On Unsupervised Learning
6	An unsupervised method for speech detection and segmentation in noisy environments using the parametric trajectory model
7	Research On 2D Image-Based Unsupervised 3D Model Retrieval Method
8	Research And Improvement Of The Clustering Algorithm Based On Sparsity Score Entropy And Density Entropy
9	Resarch And Application Of Unsupervised Video Summarization Method Based On Subtitle Semantics
10	Research On Density Based Clustering Algorithms For Varying Density Data