Font Size: a A A

Research On Density-Based Outlier Detection Over Uncertain Data

Posted on:2011-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:C L YuFull Text:PDF
GTID:2178360308480954Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the uncertainty of data mining methods are widely used in many fields, such as meteorology, economic, military, mobile telecommunication, the uncertain data research has become the focus of current data mining theoretical system. However, traditional data mining techniques can not solve the the problems of randomness and complexity caused by uncertain data effectively. We need to develop new techniques of data processing and mining about uncertain data. Outlier detection is a technology which can reveal rare phenomenons and events, find interesting patterns, and have a high applied value in many fields, such as the detecting of credit card fraud, network intrusion, abnormal climate, and alien species. Therefore, outlier detection on the uncertain data is meaningful.First, this thesis introduces outlier's detection theories, methods over the certain data. Produced reasons, management, types, and possible world model over uncertain data are presented. Second, on the problem of the outlier detection over uncertain data, a density-based algorithm calculating the local outlier factors of uncertain objects is designed in this thesis. The method uses the most common possible world model. The method can be generated: (1) to calculate the probability of possible world. (2) Because each object in the possible world is a certain data, we can use the traditional method to calculate local outlier factor (LOF). (3) Based on the results of the (1) and (2), the objects'uncertain local outlier factor (ULOF) value can be computed. The larger value of the ULOFs, the higher diffierent degree of the object has.Time complexity and characteristics of the algorithms have been analyzed, and three optimizational approaches have been proposed: dynamic programming method, pruning method and grid-based method, to improve the efficiency of the algorithms.The density-based uncertain outlier detection algorithm determines an object's"environment"by calculating the score of objects and the probability, the environment determines the outlier factor of the uncertain object, which making the algorithm is more reasonable.Finally, the extensive experiments over synthetic datasets show the proposed optimizational methods can find results effectivly and accuratly, at the same time, the time spending of the algorithms are reduced. Using the real plant datasets, the applicational values of the algorithms are illustrated at the end of this thesis.
Keywords/Search Tags:Uncertain Data, the Local Outliers, Possible World Model, Dynamic Programming, Grid
PDF Full Text Request
Related items