Font Size: a A A

Research On Anomaly Detection Technology Towards Data Stream

Posted on:2019-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:2428330545970246Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cyberspace security has always been the focus of researchers,especially for network equipment upgrading and broadband speeding-up.How to detect anomaly behavior in complex network environment has become a key issue in network security field.The conventional data mining algorithm for static data sets load entire data sets into memory at first and build static analysis model by multi-times data sets traversing in which the process costs much time and space.Dynamic data stream has the characteristics of dynamic changing data distribution,potential infinite data volume and the continuous arrival of data.These characteristics require that data stream mining algorithm can build initial model through one-time data traversing in limited memory,and the model can process incoming data in time and adjust dynamically itself to fit changed data stream.Most of the existing anomaly detection techniques are based on traditional data mining algorithms,though they can get better results on static data sets,but they cannot be applied to dynamic data stream environment.Based on the above analysis and comparison,this paper studies the new anomaly detection technology in data stream environment and makes the following work:(1)The concept and definition of data stream are expounded.The requirements and existing problems of data mining in dynamic data stream environment are analyzed.The main tasks and common algorithms of data stream mining are summarized.Data stream is a sequence consisting of continuous data objects.These data usually arrive quickly,have high dimension features,and the potential data distribution is variable.Compared with the traditional static data set,data stream mining algorithms usually store only the summary statistics of the data,scan the data in one-time,process the incoming data quickly,and adjust the model dynamically.(2)Based on the characteristics of data stream clustering and the needs of anomaly detection,the anomaly detection model based on data stream clustering is proposed.Data stream clustering can only detect the dynamic distribution of data but cannot detect anomaly data.Inspired by data stream clustering,the proposed anomaly detection model consists of two parts:online clustering module and offline detection module.The online clustering model can extract and records the statistical information of data objects,solve the storage problem caused by the growth of data volume,dynamically adjust the micro cluster structure to fit data distribution changing,and realize anomaly detection.The offline detection module exploits the measure method of similarity or difference to achieve the real-time detection of anomaly behavior,according to the clustering information maintained in online clustering module.(3)The novel anomaly detection method based on imprecise probability is proposed by analyzing the influence of imprecise probability on the selection of split attribute in decision tree and combining with Hoeffding Tree algorithm.The method introduces Imprecise Dirichlet Model to calculate the maximum entropy of credal set and estimate the real entropy change of attributes when the data is infinite.The improved algorithm could select best attribute to split,stop redundant subtree growth in time and avoid the overfitting of model.The algorithm can effectively reduce the number of nodes,maintain the classification results of Hoeffding Tree algorithm,and achieve higher accuracy of anomaly detection.At the same time,the algorithm has fast data processing speed and could realize anomaly detection under high-speed data stream environment.
Keywords/Search Tags:Data stream, Anomaly detection, Clustering, Imprecise probability, Hoeffding tree
PDF Full Text Request
Related items