Font Size: a A A

Research On Clustering Algorithm For Clusters With Irregular Structure

Posted on:2022-10-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L A GengFull Text:PDF
GTID:1488306560489914Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rapid development of information technology triggers an explosive growth of global data.Beyond basic operations on the data such as storage and retrieval,people are increasingly focusing on mining valuable laws and patterns from it.As a fundamental tool in data mining,clustering analysis aims to adaptively detect grouping patterns from unlabeled data,providing underlying support for many high-level data analysis tasks,which has been widely applied in image processing,statistic analysis,electronic commerce,bioinformation,social sciences and other areas.Since no explicit category definitions and labels are available during clustering analysis,it is necessary to introduce some priori criteria(e.g.minimizing intra-cluster variances)to form final clusters.As a result,different criteria endow the formed clusters with different characteristics.Considering that underlying clusters in real-world data usually exhibit irregularity(e.g.non-convex shapes and uneven sample distributions),the density-based and the graph-based clustering methods arise as research spotlights owing to their capacity for handling irregular clusters.However,established density-based and graph-based methods have their own shortcomings: the former may over-divide or under-divide data when containing clusters with multiple density and/or scale levels;the latter may encounter a low-quality graph due to the unsupervised graph construction,resulting in performance degeneration of clustering.Moreover,it is also a challenging topic to detect irregular clusters from data characterized by large size and high dimension under a distributed storage environment.This dissertation investigates the solutions to these issues.The main contributions are listed below.1.A clustering algorithm with ability to detect clusters with different shapes,densities and scales is proposed.The algorithm introduces a novel density measure,relative K nearest neighbor(KNN)kernel density,by using which clusters with heterogeneous densities can be detected.Based on this measure,the samples with the highest density are recognized as core samples,which are connected by a proposed KNN graph.Furthermore,the core samples which are strong connected in the graph are assigned into the same cluster.The advantage of this multi-cores representation lies in its capacity for dealing with clusters with several local density centers and thus stronger robustness.Moreover,an invalid-parameter filtering method is proposed to facilitate the parameter selection.2.A distributed algorithm is proposed to cluster data with large size and high dimension.In light of distributed computing,the data on each sub-machine is divided into several parts following a density criterion,where each part falls in a local region of the entire data,which is usually embedded in a low-dimensional subspace.To compact the representation of local regions,a subspace Gaussian model is introduced,for which a fast parameter estimation method is further proposed.3.A feature learning method based on graph frequency recombination iterations is proposed for clustering analysis.Contrast with established graph-based clustering methods,the proposed one develops an innovative approach where graphs and features are alternatively improved,alleviating the performance degrade due to lowquality graph construction.4.A clustering-based solution is proposed for thunderstorm recognition and tracking under a spatiotemporal scenario.A redundant Gaussian representation is designed for irregular spatial data.A fast Monte Carlo method is proposed to estimate the intersection index between two Gaussian models.A dual-frame clustering approach is presented to recognize and track thunderstorms simultaneously.
Keywords/Search Tags:clustering, density-based clustering, distributed clustering, graph-based clustering, graph iteration, thunderstorm tracking
PDF Full Text Request
Related items