Font Size: a A A

Research On Outlier Detection Algorithm Based On Ant Colony Algorithm

Posted on:2011-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:H N LiFull Text:PDF
GTID:2178330338991142Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Outlier detection has obtained the widespread concern with the increasing application of data mining. We can know that there are still some problems in the area of outlier detection by analyzing the existed algorithms of domestic and international. The user-defined thresholds often affect the result of outlier detection to a large extent and it is such a difficult task to define them for users who don't have previous knowledge. Few algorithms mine outliers on the basis of similarity of neighbors when mining outliers on multivariate time series data, but obviously this is a very promising area. There is still another problem: the precision is still low in outlier detection. To solve these problems we mainly study the methods of outlier detection based on ant colony algorithm.Firstly, a method of graph-cut based outlier detection using ant colony algorithm is proposed. In the first phase we make some improvements to the traditional ant colony algorithm, for example, to calculate the more accurate transition probability, we take the distance on categorical and numerical attributes as well as the local distribution situation into consideration; in this way a better graph can be constructed in limited time. Then we cut on the graph under a criterion and every subgraph can be treated as a cluster consequently. At the last step we can get top n outliers by comparing the difference of clusters and the difference of points that lie in the same cluster.Secondly, a method of outlier detection on multivariate time series data based on improved ant colony algorithm and k-means clustering algorithm is proposed. In the process of classifying the time series data we introduce the concepts of pheromone and transition probability of the ant colony algorithm into the k-means clustering algorithm, and then try to find the best clustering result which is determined by the distance of inner-clusters and inter-clusters. At the last step we calculate the outlier score of every data point by checking the distribution of the time points that lie in clusters under the criterion of neighbor-similarity.In the end we test the two algorithms proposed in this paper on both of real-world datasets and synthetic datasets. The experimental results of the proposed algorithms show the superiority on the precision of outlier detection than other existed algorithms. And the expected goal is achieved.
Keywords/Search Tags:outlier detection, multivariate time series, ant colony algorithm, graph-cut, k-means
PDF Full Text Request
Related items