Font Size: a A A

Design And Implementation Of Outlier Detection Algorithms Based On K-means Clustering

Posted on:2022-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y J QiaoFull Text:PDF
GTID:2518306509465224Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As an important technology in big data research,outlier detection has great influence on the application fields of network monitoring,telecommunications and credit card fraud,financial securities services and others.The purpose of outlier detection is to find a portion of objects in a dataset that behaves significantly differently from the majority of objects.Among many outlier detection methods,the method of clustering-based stands out because it does not need to know the distribution of data set in advance and has few parameters.However,among the current clustering-based outlier detection methods,one part of the method improves the effectiveness of outlier detection only by optimizing the clustering algorithm,and the other part focuses only on the discovery of outliers within a cluster.Therefore,in this paper,we analyze and fuse these two types of methods,propose new algorithms and complete a system,the main work is as follows:(1)A k-means based outlier detection algorithm by fusing local and global information is proposed.Firstly,the algorithm uses k-means for clustering,secondly generates the set of candidate outliers based on density-based isolation within a cluster,and finally generates the final set of outliers in the set of candidate outliers based on distance-based isolation between clusters.The algorithm was also compared with three outlier detection algorithms on seven preprocessed data sets,and the effectiveness of the proposed algorithm was verified by comparing a series of evaluation metrics.(2)An outlier detection algorithm based on k-means error loss is proposed.In this algorithm,k-means algorithm is used as the local search process,and the incremental method is adopted to gradually determine the non-outliers in the data set.Finally,the unselected points are determined as the outliers.And the algorithm was compared with three outlier detection algorithms on seven preprocessed data sets,and the proposed algorithm was verified to be more accurate in detecting outliers in the data set by the given evaluation metrics.(3)An outlier detection system based on MATLAB is designed and implemented.The system includes dataset selection,algorithm selection,parameter setting,visualization of experimental results and other functions.The system integrates the algorithms of outlier detection used and proposed in this paper,which is simple in operation,clear in interface,efficient in operation,and has good portability and interaction.In summary,solutions to the problems of clustering-based outlier detection are proposed in this paper.And the outlier detection system containing the proposed algorithm is implemented to better help users to use the outlier detection algorithms to solve the problems encountered in life and work.
Keywords/Search Tags:Outlier detection, Clustering, Isolation within a cluster, Isolation between clusters, Error loss
PDF Full Text Request
Related items