A Study Of Noise Data Oriented Algorithms Of Clustering Related

Posted on:2015-08-30

Degree:Master

Type:Thesis

Country:China

Candidate:J Lu

Full Text:PDF

GTID:2308330464457152

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Clustering analysis is a classical problem in machine learning field, it clusters the unlabeled data into several sets so that the total distance between different sets is maximized and the total distance within sets is minimized. While data is more and more complex, data contains more noise and has higher feature dimension. It makes traditional clustering algorithm lower efficiency and accuracy. This paper made further research on noise data oriented clustering problems, proposed a novel filter-class feature selection global algorithm for clustering which is not sensitive to sampling stage, and a K-means based clustering algorithm to handle noise data points. Experimental results show that the two proposed algorithms can handle noise data with high efficiency and accuracy.This paper has done the following:1) Studied the the noise data oriented clustering problem, formated the problem generally.2) For the problem that feature contains noise, studied classical filter-class feature selection algorithm for clustering, including Laplacian Score and SRANK, analyzed the principles and the shortcomings of these algorithms.3) For the problem that dataset contains noise points, studied several density-based clustering algorithms, namely DBSCAN, DLCKDT and spectral clustering, analyzed the principles and the shortcomings of these algorithms.4) Proposed two algorithms. Feature selection algorithm projected all features into sample-difference space, in which contribution of combination of two features for clustering can be estimated by similarity between two features, the best selected subset of feature and a proposed objective function can be used to score every dimensional feature. While the proposed clustering algorithm using KD-tree to help to estimate local densities of data, so that a dimension-reduced similarity matrix is calculated which is used as input of K-means algorithm.5) The simulation experiment verified the proposed algorithms, while the real-data experiment show that the proposed algorithms are better when processing noise data than other usually used algorithm.

Keywords/Search Tags:

Clutering, Feature Selection, Density, Filter, Objective Function

PDF Full Text Request

Related items

1	Research On Feature Selection Based On Muti-Objective Optimization
2	Research On Feature Selection Algorithm Based On Evolutionary Computation
3	Research On Multi-objective Feature Selection Based On Improved NSGA-? Algorithm
4	The Research Of Feature Selection Based On Probability Density Approximation
5	Research Of Image Clustering Based On Local Structure Constraints
6	A Distance Convergence And History Density Based Multi-objective Evolutionary Algorithm
7	Research On High Density Image Counting Based On Density Function Estimation
8	Research On Filter Feature Selection Algorithm
9	Feature Selection Approaches Based On Weighted Kernel Density Estimation
10	Probabilistic Model Based Evolutionary Algorithm And Preference Based Selection In Evolutionary Multi-objective Algorithm