Font Size: a A A

Research On Outlier Detection And Its Optimal Algorithms

Posted on:2011-09-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:P YangFull Text:PDF
GTID:1118330338982764Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
An outlier in dataset is an observation or data pattern which is considerably dissimilar or inconsistent with the remainder of the data. In most cases, outliers are abandoned due to be considered as noise. Objects including important information, however, are outliers found in some real-life applications. Outlier detection aims to find outliers in dataset by utilizing statistics, machine learning, intelligent computing, visualization and the other technology for further analysis and study.Since the rare events may contain important knowledge, outlier detection has a number of useful applications such as in defend for communication and credit card fraud, medical insurance, market analysis and weather forecast. Thus the study on outlier detection is very significant both on research and practice. How to efficiently and effectively find and deal with abnormity in large high dimensional dataset is a challenging problem.We focus on finding abnormity in datasets with clustering and classified structure and studying the implement and optimization of key technology for outlier detection in this paper. We have proposed outlier detection method based on spectral clustering and RBF neural network, and implement attribute reduction to speed up finding outliers by utilizing rough set. The main results are outlined as follows:â‘ The basic theory and traditional algorithms of spectral clustering are analyzed and studied roundly. Clustering on complex datasets can be implemented by using spectral method. An advanced algorithm based on random walk is proposed, which introduces the density sensitive distance metric to calculate the similarity between objects more accurately, and automatically selects the optimal clustering number according the eigenvalues of stochastic matrix. The stable cluster obtained by using such algorithm is the premise of achieving effective outlier detection.â‘¡It is the first time to apply spectral clustering for outlier detection, and its feasibility can be proved by the definition of extended multicut and piecewise constant eigenvectors. An outlier detection algorithm based on spectral clustering is proposed, which first partitions the dataset, then calculate the outlying factor of objects in each cluster and identifies the outliers according such values. In the spectral clustering process, a sparse matrix can be obtained by using shared neighborhood based adjacent matrix whose first eigenvectors can be easily computed by Lanczos method. â‘¢An outlier detection model by using RBF neural network is constructed, which utilizes subtractive clustering algorithm for selecting the hidden node centers so as to achieve faster training speed. In the network training process, a regularization term is added in the traditional error function to minimize the variances of the nodes in the hidden layer. By defining the degree of outlier, we can effectively find the abnormal data whose actual output is serious deviation from its expectation as long as the output is certainty.â‘£To solve the inefficient problem of finding outliers in large high dimensional datasets, an attribute reduction based detection method is proposed by introducing the concept of rough set. By defining outlying partition similarity, we can mine the outliers on the key outlying attribute subset rather than on the full dimensional attribute set of dataset as long as the similarity of outlying partition produced by them is large enough. An effective method for finding the key outlying attribute subset is proposed, and the experimental results testify its effectiveness.
Keywords/Search Tags:Outlier Detection, Spectral Clustering, Artificial Neural Network, Outlying Reduction, High Dimensional Dataset
PDF Full Text Request
Related items