Font Size: a A A

Density-based Outlier Detection On Uncertain Data

Posted on:2016-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:J L LinFull Text:PDF
GTID:2308330479985379Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the deep understanding in the data acquisition and processing technology, uncertain data mining technology plays a important role in the mobile telecommunications, military, economic and meteorological fields, such as a GPS device or mobile phone location tracking, sensor data management and feature data extraction and so on. However, in real life the data is not always correct, such as sensor networks, privacy protection, data integration, location-based services and RFID technology application process, the data collection methods, climate or human disturbance and other external factors will generate a lot incomplete or error data. These data objects are not single data points, which distribute according to a certain probability, we call them uncertain data. Because of the randomness and complexity of the uncertain data, it is difficult to use traditional data mining techniques to do outlier detection on uncertain data, so there is real research significance to do outlier detection on the uncertain data.The subject use the density-based method to do outlier detection on uncertain data, and definite concept of a density-based local outlier factor uncertain(Uncertain Local Outlier Factor: ULOF) to characterize the outlier extent of objects from uncertain data set groups, the higher the value the greater the degree of outlier, then extract the highest degree of outlier set of objects.This subject mainly completes the following content:①According to the type of data and the causing of uncertain data, design density-base outlier detection algorithms for uncertain data of tuple level. By building the possible world model to determine the probability of uncertain data objects in the possible world, combining with traditional local outlier factor(LOF) algorithm to deduce ULOF algorithm, finally judge the outlier degree of the object based on the value of each object ULOF.②Consider the attribute value and the probability of uncertain object, each combination the ULOF value of each uncertain object to achieve UTop-k query on uncertain data sets to identify outlier data objects with highest outlier degree.③Make a detail analysis on ULOF algorithm efficiency, accuracy and the time complexity and space complexity, propose the grid-based pruning strategy, k-nearest neighbor query optimization to reduce the candidate set of data, improving the efficiency of the algorithm effectively, reducing the time complexity of the algorithm.④Through simulated data and real data experiments to assess the feasibility of ULOF algorithm, than compare and analysis efficiency, scalability and accurate precision of ULOF algorithm with different parameters.⑤Finally, this paper summarizes the research work and make a prospect for future development trend of uncertain data outlier detection.Through the experiment results prove the feasibility, high efficiency and high accuracy of ULOF algorithms, which have good flexibility in the amount data sets and high data dimension. And the ULOF method which optimized improves the accuracy of outlier detection effectively, reducing the time cost, improves the performance of outlier detection on uncertain data.
Keywords/Search Tags:uncertain data, local outlier detection, possible world model, k-nearest neighbor
PDF Full Text Request
Related items