Font Size: a A A

A Study Of Noise Data Oriented Algorithms Of Clustering Related

Posted on:2015-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:J LuFull Text:PDF
GTID:2308330464457152Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is a classical problem in machine learning field, it clusters the unlabeled data into several sets so that the total distance between different sets is maximized and the total distance within sets is minimized. While data is more and more complex, data contains more noise and has higher feature dimension. It makes traditional clustering algorithm lower efficiency and accuracy. This paper made further research on noise data oriented clustering problems, proposed a novel filter-class feature selection global algorithm for clustering which is not sensitive to sampling stage, and a K-means based clustering algorithm to handle noise data points. Experimental results show that the two proposed algorithms can handle noise data with high efficiency and accuracy.This paper has done the following:1) Studied the the noise data oriented clustering problem, formated the problem generally.2) For the problem that feature contains noise, studied classical filter-class feature selection algorithm for clustering, including Laplacian Score and SRANK, analyzed the principles and the shortcomings of these algorithms.3) For the problem that dataset contains noise points, studied several density-based clustering algorithms, namely DBSCAN, DLCKDT and spectral clustering, analyzed the principles and the shortcomings of these algorithms.4) Proposed two algorithms. Feature selection algorithm projected all features into sample-difference space, in which contribution of combination of two features for clustering can be estimated by similarity between two features, the best selected subset of feature and a proposed objective function can be used to score every dimensional feature. While the proposed clustering algorithm using KD-tree to help to estimate local densities of data, so that a dimension-reduced similarity matrix is calculated which is used as input of K-means algorithm.5) The simulation experiment verified the proposed algorithms, while the real-data experiment show that the proposed algorithms are better when processing noise data than other usually used algorithm.
Keywords/Search Tags:Clutering, Feature Selection, Density, Filter, Objective Function
PDF Full Text Request
Related items