Font Size: a A A

Research And Application Of Outlier Detection Methods Based On PSO

Posted on:2014-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:M J WangFull Text:PDF
GTID:2308330461472547Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining techniques like clustering, classification and association rule analysis try to extract general models and common features from massive amounts of data. These models and features provide help for scientific decisions. On the contrary, outlier detection focuses on the minority which is obviously different from the other objects and deviates so much from general model of the data set. Such potential outliers may indicate important domain knowledge like credit card fraud, network intrusion, novel disease and so on. Hence, the research on outlier detection is of great realistic significance.Traditional outlier detection methods may encounter problems like difficulty in parameter setting, high computational overhead and meaningless definition of similarity. To overcome problems mentioned above, the scholars have recently made efforts to apply particle swarm optimization algorithm to outlier detection and achieved good effects. But there still exists a few defects. This paper explores further on how to make use of PSO algorithm to detect outliers in an effective way. Based on the idea of minimizing objective function, swarm intelligence approaches based on PSO are proposed and then applied to outlier detection from two different points of view. One is protecting the particles’diversity to avoid premature convergence. The other is to define a more suitable fitness function. The research mainly includes the following aspects:1. In order to slow down the recession of particle swarm diversity, we proposed an improved algorithm through collective particles decentralization (CPD-PSO). Combining the variation mechanism, the algorithm makes collective particles jump out of the local optima with relatively high ability of global exploration via a control on the inertia weight and constriction factor. The diffusion on particles that are excessive gathered together is affected by the iterations and concentration of the swarm. Simulation experiments show that the CPD-PSO algorithm outperforms the standard PSO algorithm as well as some other modifications.2. In the field of outlier detection, particles are encoded as data index and neighborhood radius. To apply CPD-PSO algorithm into outlier detection, a modification on this algorithm is needed. Similarity between two particles is defined as the summary of absolute difference on these two dimensions. Only particles with the same data index but slightly difference in neighborhood radius are likely to be decentralized. Experiments on several UCI datasets demonstrate the advantage of our modification over standard PSO algorithm.3. Mohemmed et al. recently proposed a new detecting method by translating outlier detection into the minimization on objective function, and the PSO algorithm was applied for optimization. Tracking data of experiments on yeast dataset tells that there exists an unreasonable phenomenon that its definition of fitness function does not necessarily ensure a good match with outlying degree of an object. The paper makes a cause analysis and searches for a new fitness function. Theoretical analysis and experimental results are also given at the same time. The results show the superiority of the proposed outlier detection method with new fitness function over the original one and the LOF, LDOF and KNN algorithms.
Keywords/Search Tags:Outlier detection, Particle Swarm Optimization (PSO), Outlying degree, Fitness function, Collective particles decentralization
PDF Full Text Request
Related items