Font Size: a A A

Research On Outlier Detection In Data Stream Based On Density

Posted on:2020-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:F B ZouFull Text:PDF
GTID:2428330599959718Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Outlier detection is one of the important research hotspots in data mining,which has always been widespread concerned by the academe.With the rapid development of the Internet of Things and hardware technology,data is being generated with faster speed.The data form faced by outlier detection includes not only static data with known size but also dynamic data with the properties of massive,real-time and variable.In order to solve the problems of the inaccurate measurement of outlier degree and the low detection efficiency about outlier detection in data stream,this paper proposes two effective outlier detection algorithms.The whole research includes the following studies:(1)Aiming at the problems of complex structure and inaccurate description of the traditional angle-based outlier factor,an outlier detection algorithm based on local density of vector dot product in data stream is proposed.Firstly,the sliding window model is used to process the incoming data,and the mean of dot product and the local density of vector dot product of the data points in the current window are calculated.Then,using the local density of vector dot product to evaluate the outlier degree of each data point,and finding the outlier partition point based-on supreme slope model.Finally,the candidate outliers in the current window are determined according to the outlier partition point,and the candidate outliers that meet the validation requirements are determined to be real outliers.Compared with the classical outlier detection in data stream,the proposed algorithm can not only better adapt to the characteristics of the data stream,but also has higher detection performance.(2)To deal with the problems of sparse description and low detection rate in high-dimensional data stream,an outlier detection algorithm based on density of hypercube in data stream is proposed.The algorithm redefines the neighborhood and the density of hypercube between data points by combining the advantages of grid and neighborhood density,and solves the problem of the attribute domination in practical application by using data attribute normalization method.Meanwhile,the concept of the accumulated value of overlap for outlier screening is proposed innovatively to reduce the misjudgment in outlier detection.Compared with the classical outlier detection in data stream,the proposed algorithm improves the real-time detection ability in high-dimensional data stream.The synthetic data experiment is introduced to verify the effectiveness of the proposed algorithm.By comparing with the classical data stream outlier detection algorithm on the UCI test datasets,the advantages of outlier detection of the proposed algorithm in data stream are further demonstrated in the aspects of outlier detection rate,false alarm rate,ROC performance curve and AUC measurement value.
Keywords/Search Tags:outlier detection, data stream, local density of vector dot product, density of hypercube, accumulated value of overlap
PDF Full Text Request
Related items