Font Size: a A A

Research On Outlier Mining Method Oriented To Multidimensional Data

Posted on:2012-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:X C GuFull Text:PDF
GTID:2178330332996986Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Outlier detection is an important branch of data mining's studying direction. Outlier detection method not only can eliminate the noise in the data,but also can find out the latent, unusual,rare and meaningful knowledge.The outlier detection method used for eliminating the noise in the data can improve the appicated efficiency and the accuracy of the data.The outlier detection method used for finding out the latent, unusual,rare and meaningful knowledge can discover some unusual phenomenons in the real world, which has a very vast applicated prospect. This paper puts forward a outlier detection method based on the whole difference degree in the basic of the multi-data mode.To carry out the analysis and comparition of the main available outlier detection method.Outlier detection initially occurs in distribution statistics field. Outlier detection based on statistics has firmly support by the probability statistics theory. According to the probability statistics model can detect the outlier's meaning efficiently. But this method of outlier detection is under the environment which supposes that the data comes up to some distribution.The outlier detection based on distance and density is difficult to definite for the parameter is not obvious or the space-time is too complicated.The method based on bias mainly use the sequential extraordinary technology and OLAP(On-Line Analysis Processing)data cube technology, this method is very difficult to carry out outlier detection for it involves variation function and multidimensional and multilayer conception.Aiming at the problem of outlier detection for the multi-dimension data puts forward a new method of outlier detection which can detect the corresponding outlier set efficiently under the environment that the data distribution is unknown. The method mainly useing the idea of whole to division brings forward an algorithm of outlier detection base on whole difference degree, this algorithm using a design method of stepwise refinement from up to down is controlled by two arguments, undetermined coefficient and jump coefficient. It avoids deleting the data object of the non-outlier in the data set which produced by the algorithm controlled by a single coefficient mistakenly,and users can decide the values of the arguments easily. The algorithm carrying simulation experiment out under an environment of MATLAB conducts outlier detection for different random data sets types, and the results of the experiment validates the algorithm's validity.
Keywords/Search Tags:Outlier Detection, Data Mining, Whole Difference Degree, Multi-data, Data Distribution
PDF Full Text Request
Related items