Font Size: a A A

Research On Outlier Detection Technology In Uncertain Data Sets

Posted on:2020-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhongFull Text:PDF
GTID:2428330602953953Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of database technology,the amount of data that needs to be stored and processed in the database is increasing.How to extract potential and even undiscovered value information from massive data has become a hot issue in the current database research field.Outlier detection technology can help users discover abnormal but valuable data information.which has been widely used in medical diagnosis,financial fraud,environmental monitoring and other fields.At present,outlier detection technology in the field of traditional databases has achieved many excellent research results.However,as people,s understanding of data collection and data processing continues to deepen,people gradually realize that uncertain data is widespread in production and life.However,the existing outlier detection methods in this emerging uncertain database field have certain limitations and cannot effectively meet the needs of real-world applications.To this end,this paper studies the problem of outlier detection in uncertain data and uncertain data streams.Specifically,the main contributions of this article are as follows:1.Fast Outlier Detection Algorithm on Uncertain Data Sets(FODU)is proposed.Firstly,the indexing construction strategy is given by using the hierarchical division idea.This index structure overcoles the limitations of traditional indexing on multidimensional data management and avoids the generation of spatial redundancy.Then,a new filtering method is proposed,which includes two processes of batch filtering and single point filtering.In batch filtering,the computational cost of traditional methods requiring additional spatial queries can be avoided;in the single point filtering,the limit range estimated by the tightening probability can reduce a large number of redundant calculations.Finally,a new outlier probability value calculation rnethod is proposed.This method can avoid the iterative calculation of the set of complete neighbors according to the recursive law of the outlier probability value,thus reducing the overall computational cost.2.Fast Outlier Detection Algorithm Over Uncertain Data Streams(FOD_OUDS)is proposed.Based on the FODU algorithm,the FOD_OUDS algorithm focuses on the difficult cost of storing and detecting the outliers in an uncertain data stream environment.By deeply analyzing the nature of outliers in uncertain data streams,the storage structure of data points,the storage structure of divided sub-blocks and the storage structure in sliding windows are designed.This not only reduces the part of the storalze cost.but also speeds up the data update speed through the stored information of the intermediate process,thereby improving the outlier detection efficiency in the uncertain data stream.3.The correctness of the proposed algorithm is proved by a large number of comparative experiments.Conpared with the PCUOD algorithm,the Fast Outlier Detection Algorithm on Uncertain Data Sets(FODU)is proved,and the Fast Outlier Detection Algorithm Over Uncertain Data Streams(FOD_OUDS)is proved.
Keywords/Search Tags:Uncertain Data, Uncertain Data Streams, Outlier Detection, Hierarchical Indexing, Filtering Method
PDF Full Text Request
Related items