Font Size: a A A

Research On Sequence Pattern Mining For User Behavior Anomaly Detection Over Data Streams

Posted on:2017-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2428330569998736Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous expansion of network applications,security issues have become an urgent need to address,of which the security issues caused by the internal threats is particularly prominent.The anomaly detection based on user behavior can effectively detect and prevent internal threats.However,in the complex user behavior scene,timely and accurate anomaly detection has become the current urgent technical issues to address.The behavior anomaly detection technology based on sequence pattern effectively mines the association between events and constructs the behavior pattern,which is used to detect the anomaly by comparing the actual behavior with the normal pattern.Moreover,it is becoming a hotspot in the user behavior anomaly detection.Under the current complex application scenarios,the diversity,complexity and high frequency of user behavior presents a serious challenge to anomaly detection technology.Firstly,the behavior data arrives in real time,and the boundaries are unknown.This requires that the sequence based anomaly detection algorithm performs adaptive and low latency to ensure that behavior data can be fast and accurately processed.Secondly,there is a structural relationship between user behavior,and the distribution of data effected by subjective randomness has a dynamic change.This requires that behavior anomaly detection algorithm based on sequence pattern dynamically adjusts behavior pattern to ensure the accuracy of detection.At present,the research of behavior anomaly detection based on sequence pattern is mostly about non-real-time detection of historical data when the way of behavior data partitioning is known.Meanwhile,the existing detection algorithms do not consider the structural relationship of user behavior,which cannot meet the requirements of accuracy.To solve the problems,this paper regards low latency,accurate and adaptive user behavior anomaly detection as the goal,and focuses on the research of user behavior anomaly detection over data stream,which designs and implements a modular based behavior anomaly detection system.In the complex application scenario,the data of user behavior with unknown boundaries arrives fast and there is a structural relationship,which requires the behavior anomaly detection based on sequence pattern can adaptively divide the behavior data to ensure the low delay and accuracy of detection.To solve this problem,this paper proposes a Bayes based sequence behavior anomaly detection algorithm,called BSBADetection.Based on Bayesian network,BSB-ADetection constructs the sequence pattern and takes account of the structural relationship of user behavior,which realizes the low latency and accurate detection.In order to realize the adaptive division of user behavior over data stream,BSB-ADetection adopts time correlation and fuzzy logic to define behavior association intensity,which restores real user behavior scene based on rolling window over data stream.In order to achieve low latency and accurate detection,BSB-ADeteciton adopts the dynamic pruning method to reduce the complexity of projection space,and matches the top-k behavior pattern by the behavior correlation intensity,which reduces the processing delay.Meanwhile,BSB-ADetection adopts the directed loop graph based storage strategy to preserve the structural relationship of user behavior,and calculates the similarity score by Bayesian network,which ensures the accuracy of detection.Experiments show that BSB-ADetection effectively realizes adaptive division of user behavior over data stream,reduces the processing delay and improves the accuracy of detection.Compared with the classical algorithm PrefixSpan,BSB-ADetection reduces the processing delay by 36.8% and the false positive rate by 6.4%,which the highest accuracy rate is up to 98%.The subjective randomness of users leads to the change of data distribution over data stream,which leads to the concept drift.This requires that the behavior anomaly detection algorithm based on sequence pattern has self-adaptability.To solve this problem,this paper proposes an incremental user behavior sequence pattern updating algorithm over data stream,called ISPU.ISPU introduces the concept of time decay factor,which realizes the dynamic update of sequence pattern.Meanwhile,according to the pattern mutation,ISPU realizes the recognition and adaption of concept drift.In order to realize the dynamic updating of sequence pattern,ISPU assigns time-decaybased weights to all patterns.By controlling the decay exponent,both the excessive growth of the pattern and the loss of direct pruning are avoided to adapt the dynamic changes.In order to adapt to the concept drift,ISPU takes the pattern mutation as the detection index of concept drift to identify the concept drift.Meanwhile,ISPU adjusts the time decay factor to accelerate the pattern update.Experiments show that ISPU performs well in self-adaptability.In the process of detection,the number of patterns is stable at 23~25,which indicates that ISPU adapts to the dynamic change.In the initial period of concept drift,the number of patterns grows exponentially and falls fast in a short time,which indicates that ISPU fast and accurately adapts to the concept drift.In order to validate the research results of this paper,and considering the requirements of the flexibility,throughput and processing delay in practical application scenarios,this paper designs and implements a modular based user behavior anomaly detection system,called MB-UBAD.Based on the idea of “layering-decoupling”,MB-UBAD integrates the existing data stream processing system in a modular way,which improves the flexibility,throughput and reduces the processing delay.Meanwhile,MB-UBAD adopts BSB-ADetection and ISPU as the basis of algorithm,which accomplishes the task of behavior anomaly detection over data stream.In order to improve the flexibility,MB-UBAD introduces the topology mapping technology based on workflow engine,which reduces the manual configuration process and makes the physical nodes transparent to the users.In order to improve the throughput and reduce the processing delay,MB-UBAD introduces rule-based self-association technology,which dynamically allocates nodes according to cluster real-time status and avoids the overloading nodes.Experiments show that MB-UBAD performs well in flexibility,throughput and processing delay,which effectively meets the requirements of user behavior anomaly detection.
Keywords/Search Tags:Anomaly Detection, Sequence Pattern, Data Stream, Concept Drift, User Behavior
PDF Full Text Request
Related items