Font Size: a A A

Research On Outlier Data Mining High Dimensional Space

Posted on:2011-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WuFull Text:PDF
GTID:2178330332967865Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Outlier detection become an important research direction in the field of data mining,widely used in the field of financial fraud and network intrusion detection, disease prevention and control,disaster,and many other aspects of weather forecasting.As the research proceeded,large-scale,low-dimensional data in the detection of outliers have a more in-depth study,which has been made many achievements.However,in the large,high-dimensional data in the detection of outliers is still faced with many problems and challenges,a lot of problems need to be in-depth,systematic study.This paper is based on existing algorithms,presents a outlier mining method based on the combination of genetic algorithm and simulated annealing algorithm on the large,high-dimensional data.This paper introduces the data mining and outlier mining concepts,compares and analyses the existing outlier detection algorithms,discusses several important high-dimensional outlier detection algorithms,and points out the drawback of them.On this basis,an new outlier detection methods of genetic algorithm and simulated annealing algorithm in the high-dimensional space is proposed.In this method,the high-dimensional data of each dimension is divided into grid,in order to overcome the crack caused by the grid of adjacent data points of the division,two grid classification methods have been used, and the results of the two have been stored into the same grid computing tree, then the data points in each grid are coded, and the sparsity coefficient of each grid is calculated.To reduce the computational complexity,find the smallest factor of the top-n grid and the points in high-dimensional space,genetic algorithm is adopted in this paper.In order to prevent "premature" phenomenon,simulated annealing algorithm is introduced.The experiments shows that the above method is effective.
Keywords/Search Tags:data mining, outlier, sparsity coefficient, grid count tree, genetic algorithm, simulated annealing algorithm
PDF Full Text Request
Related items