Font Size: a A A

Study On Distance-Based Outlier Mining Algorithm

Posted on:2012-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z K YangFull Text:PDF
GTID:2178330338997325Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Outlier detection as an important part of knowledge discovery, widely used in fraud identification, intrusion detection, fault diagnosis and bad weather forecast, etc. In recent years, as people depth understanding of the importance of outlier data mining and its more widely used, outliers mining became one of hotspots in the field of data mining. Outlier detection algorithm can be roughly divided into: distribution-based; depth-based; distance-based; density-based; clustering-based. Among them the distance-based outlier detection algorithm can flexible formulate distance functions and effective get outlier information, it is of great theoretical significance and practical value. However the current research in practice still has some problems, such as the selection of initial parameters through experiences, algorithm in high-dimension and large datasets operation is efficiency, etc.Now ,the distance-based outlier detection algorithm is put forward many advanced algorithms, such as KNN algorithm. This article on the refunding of KNN algorithm testing outliers precision and efficiency of research and experiment problems, and puts forward the algorithm based on weighted KNN thoughts. Through KNN method in traditional for each point on the basis of the concept of increased weight, the weight is with recent k a neighbor's average distance, its outliers for those with the first k a neighbor's distance under the same conditions as the largest and the biggest point weights, thereby improving the accuracy of outlier detection algorithm. Main research works include:1. Research from the group of data mining and data mining background, present situation and significance of the research, the paper analyses the existing outliers detection algorithm, compared the commonly used outliers detection algorithm, scope of the advantages and disadvantages.2. Studied some of the related data mining technique, such as data mining to some data pretreatment technology before; And some clustering algorithm research.3. In the classical KNN from group of detection algorithm is proposed on the basis of, and proposes an improved based on weighted KNN's outliers detection algorithm for each point, through increased weight to improve the accuracy of the test from the group, and proves its optimal and the original algorithm.4. The experiment of paper, Wisconsin Cancer by UCI data sets the accuracy of this algorithm was verified. And through some simulation of large data sets in different data validation algorithm quantities and data dimension of time efficiency, and under with the original method in the same dimension and different amount of data comparison of time efficiency. Results show that the proposed based on weighted KNN from group of detection algorithm can have can effectively detect the data set, meanwhile, outliers than traditional KNN algorithm has better performance.
Keywords/Search Tags:data mining, outlier, weight, partition
PDF Full Text Request
Related items