Statistical Modeling for Simultaneous Data Clustering, Features Selection, and Outliers Rejection

Posted on:2011-08-22

Degree:M.A.Sc

Type:Thesis

University:Concordia University (Canada)

Candidate:Almakadmeh, Khaled

Full Text:PDF

GTID:2448390002469009

Subject:Engineering

Abstract/Summary:

Model-based approaches and in particular finite mixture models are widely used for data clustering, which is a crucial step in several applications of practical importance. Indeed, many pattern recognition, computer vision, and image processing applications can be approached as feature space clustering problems. However, the use of these approaches for complex high-dimensional data presents several challenges such as the presence of many irrelevant features, which may affect the speed, and compromise the accuracy of the used learning algorithm. Another problem is the presence of outliers which potentially influence the resulting model parameters. Generally; clustering, features selection, and outliers detection problems have been approached separately. In this thesis, we propose a unified statistical framework to address the three problems simultaneously. The proposed statistical model partitions a given data set without a priori information about the number of clusters, the saliency of the features, or the number of outliers. We illustrate the performance of our approach using different applications involving synthetic data, real data, and objects shape clustering.

Keywords/Search Tags:

Data, Clustering, Outliers, Features, Statistical

Related items

1	Research On The Outliers Detection Algorithm
2	Research On Extended Knowledge Discovery In High-Dimension And Sparse Outliers Set
3	K-distance-based Outliers And Clustering Algorithm
4	High-Dimensional Data Clustering and Statistical Analysis of Clustering-based Data Summarization Products
5	Algorithms to identify clusters and outliers based on dyadic decomposition with applications to streams
6	Research On Nominal Data Clustering/Classification Algorithms With Their Applications In Anomaly Detection
7	Data clustering using statistical physics: Fundamentals, extensions, and applications
8	Online detection of outliers for data streams
9	Strategic targeting of outliers for expert review
10	Research On Outliers Detection In Data Stream Based On Unsupervised Learning