Font Size: a A A

Research Of Online Anomaly Detection Method For Streaming Data Based On Matrix Sketching And Hash Learning

Posted on:2022-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:P WuFull Text:PDF
GTID:2518306575969099Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,the streaming data over networks has exploded.Detecting anomaly data from massive data streams in a timely and accurate manner becomes a crucial issue.The traditional anomaly detection methods based on complete static data sets usually need to access the data many times,which have the defects of poor real-time performance and high computational complexity.On the other hand,due to the high scale and dimension of streaming data for practical problems,the existing anomaly detection methods usually have the problem of difficult to store and "curse of dimensionality".In this context,an online anomaly detection method based on matrix sketching becomes a new breakthrough for such large-scale and high-dimensional streaming data.This method can provide a smaller sketching matrix for large-scale streaming data online and has significant advantages in reducing data scale.It has attracted widespread attention in the fields of data mining,computer vision,etc.However,there are still some important issues to be solved for this online anomaly detection of streaming data based on matrix sketching,such as maintenance and update of the sketch matrix,calculation and storage efficiency of the streaming data,etc.Therefore,this thesis majorly intends to tackle these challenges and propose a new online anomaly detection algorithm for streaming data.The main contributions can be summarized as follows:1.In order to solve the problem that the streaming data cannot be used for online hash learning,a zero-mean online matrix algorithm is proposed.Due to the need for zero-mean processing over global data for the hash learning,the proposed algorithm first needs to perform zero-mean over data block collected at each detection time,and then adds a virtual sample after the processed data block to empower the data block with the global zero-mean feature.Finally,the matrix sketching method is employed to represent the data block with a smaller data matrix,which can satisfy the scalability of online hashing and solve the problem of the high complexity of hash learning caused by the large sample size.Experimental results show that this method can effectively solve the problem that streaming data cannot be hashed online,and matrix sketching can accelerate the learning speed without affecting the hash learning.2.In order to solve the problems such as unsatisfactory performance of the existing online anomaly detection methods for large-scale,high-dimensional and high-speed streaming data,difficulty in storage and calculation,and difficulty in adapting to concept drift,an online anomaly detection algorithm based on matrix sketching and hash learning is proposed.Firstly,a matrix sketching sub-model is constructed,and a sketch matrix whose scale is much smaller than that of the original data is maintained online to approximate the original data and reduce the scale of the streaming data.Then,a learning to hash sub-model is constructed to map the data to a low-dimensional space based on the unsupervised linear hashing projection method,which can improve the efficiency of storage and computation.Furthermore,an anomaly discrimination model is built to query the nearest neighbor data of the candidate data based on the idea of locally sensitive hashing,and the anomaly score is calculated by using Hamming distance to further accelerate the calculation efficiency.Finally,an online updating mechanism is designed to improve the accuracy of detection and the ability to adapt to concept drift through incremental learning of new normal data.Experimental results illustrate that the proposed algorithm has better detection performance and higher adaptability to concept drift compared with the existing online anomaly detection algorithms.
Keywords/Search Tags:streaming data, online anomaly detection, matrix sketching, hash learning, locally sensitive hashing, concept drift
PDF Full Text Request
Related items