Font Size: a A A

Research And Implementation On Outlier Detection Method Based On SOFM Clustering Algorithm

Posted on:2018-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z J WeiFull Text:PDF
GTID:2348330563952477Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As one of the important branches of data mining,the outlier detection method is widely used in many areas such as credit card fraud,medical health,stock analysis,user reputation assessment,network intrusion detection and so on.The development of the outlier detection methods is promoting these industries in a healthy,stable and safe direction.Over the years,many scholars have studied the outlier detection methods,which based on the density of neighbors and based on clustering pruning.In the face of the mass data set,the calculation of the outliers in the outlier detection method based on the neighbor density has a high time consumption.The outlier detection method based on clustering pruning is used to reduce the computational cost through clustering of the data set.However,in the process of clustering pruning,The clustering effect of the clustering algorithm still needs to be further improved.Considering the limitation of the outlier detection methods based on the neighbor density and the methods based on clustering pruning,in this paper,An outlier detection method based on optimized SOFM clustering is proposed to improve the clustering effect of SOFM clustering algorithm and the time performance and accuracy of outlier detection on the basis of guaranteeing the validity of detection results.The main contents of this paper are as follows:(1)The SOFM clustering algorithm based on Canopy algorithm is proposed to improve the SOFM clustering algorithm,which uses the Canopy algorithm to vaguely determine the number of neurons and the corresponding weight vector and the neuron is dynamically adjusted by self-increasing method,the location of the neuron is adjusted based on the nearest farthest principle,the neuron position is further optimized based on the data block and finally the similar neurons were merged from the global optimal angle.The algorithm can avoid the random selection of the number of neurons and the corresponding vector in the initial stage,and adjust the network structure in the training process to reduce the possibility of death neurons and improve the clustering effect.(2)Based on the analysis of the outlier factors in the LOF algorithm,an outlier detection algorithm based on the near-neighbor information entropy is proposed.The algorithm uses the center point block ordering algorithm to reduce the number of the data to be detected,and uses the balance algorithm based on the neighborhood variance to dynamically determine the K value in the K neighborhood to avoid the random selection of the K value,And the outlier factor redefined by referring LOF algorithm and information entropy to reducing the time complexity of calculating outliers.(3)Completing the experimental design and experimental analysis.The validity of the proposed method is verified by the analysis,and the computational complexity is reduced by clustering pruning to ensure that the NELOF algorithm reduces the time cost of outlier detection without losing its effectiveness.
Keywords/Search Tags:clustering, Canopy, SOFM, outlier, outlier factor, information entropy
PDF Full Text Request
Related items