Font Size: a A A

Optimal Density Clustering And Validity Analysis Of Double Statistics

Posted on:2019-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LiaoFull Text:PDF
GTID:2428330623962416Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Cluster analysis is one of the important research directions in machine learning.In contrast with supervised learning and semi-supervised learning,clustering classifies the samples according to the structural characteristics of the datasets.Besides,it can explore the hidden information of datasets,which has an important research value and could be widely used in current digital and information age.The research of clustering analysis mainly includes data preprocessing,clustering algorithm and clustering validity index.Aiming at different application objects,many scholars have proposed various algorithms ever since the development of clustering analysis.Each of algorithms have their own advantages and defects.In data reduction algorithms,the commonly used sample reduction algorithms cannot reflect the structure of the datasets well,or there are parameters that depend on user experience;In terms of clustering algorithms,many existing algorithms usually have parameters that need to be determined artificially,so they can not realize completely unsupervised clustering process;In terms of clustering validity index,most of the indexes proposed at present are aimed at specific clustering algorithms,which is not universal enough and datasets kinds are restricted.Based on the research and analysis of the existing algorithms,this paper proposes new or improved algorithms for the above three aspects.The main research results are as follows.Firstly,aiming at the problem that the existing data reduction algorithms cannot perfectly reflect the structural characteristics of the datasets,a density reduction algorithm based on dichotomy is designed.This algorithm is able to reduce the datasets samples without parameters and function well in getting rid of noises,reducing sample size and preserving the structural characteristics of dataset.Secondly,as the truncation radius of peak density clustering algorithm needs to be determined artificially,an algorithm with optimum radius is proposed,which defines a new concept of density resolution.For different datasets,when density resolution reaches the maximum,the truncation radius could contribute to the best clustering result of peak density clustering algorithm.The new algorithm can automatically determine the truncation radius while retaining the advantages of the original algorithm which is efficient and can cluster arbitrary shape datasets.Experimental verification is carried out and the time complexity and space complexity of the algorithm are analyzed.Finally,for the limitations of the existing validity indices on the structure characteristics of application datasets,a clustering validity index based on double statistics is proposed.It is innovative to apply boundary points in distinguish clustering validity.Combining with improved Gap index,it could recognize the optimal category in different datasets,and can evaluate the clustering results without relying on the clustering algorithms.The proposed algorithms are validated by the artificial datasets with different characteristics and the real datasets of UCI common test platform.
Keywords/Search Tags:Data mining, Clustering, Data reduction, Density resolution, Validity index, Boundary points
PDF Full Text Request
Related items