| Access super points are the hosts that interact with a huge quantity of other hosts in a certain duration.The goal of access super point detection algorithm is to find all hosts whose opposite-host numbers are greater than the threshold.Access super points are typical servers,proxies,scanners,and hosts suffering from Distributed Denial of Service(DDoS)attacks.The detection of access super points is of great significance to network security and network management,and it is also a hot issue in this field that has not been completely solved.The existing access super point detection algorithms have the following drawbacks:they are time-consuming,and cannot process high-speed network data in real-time;they cannot run under the sliding time window,and fail to detect access super point across time window boundaries;the communication in a distributed environment is inefficient and unbalanced.In this dissertation,the three problems above are studied.In this dissertation,the Graphics Processing Unit(GPU)parallel computing method is introduced into the research field of access super point detection,which solves the real-time problem of access super point detection.Based on Bernstein’s condition and other basic theories of parallel computing,a set of conditions to judge if an access super point detection algorithm can run parallel in GPU environment is given and a general framework of GPU-based access super point detection is proposed in this dissertation.Using this framework,three existing eligible access super point detection algorithms are transplanted to three kinds of GPU and are then tested on real-world traffic with lOGb/s and 40Gb/s bandwidth.The experimental results show that the effective running time on a GPU platform is only 1/50 to 1/2050 of that on a CPU platform,on the premise of guaranteeing the same estimation accuracy for the eligible access super point detection algorithm.Access super point detection under sliding time window faces three challenges:frequent detection,incremental updating of main data structure and fast maintenance of main data structure state.In this dissertation,a method combining Rough Estimator(RE)with counters under the sliding time window is proposed to solve the above problems.The idea of introducing the RE into the access super point detection algorithm is proposed for the first time.This method is based on the classical cardinality rough estimation algorithm in the field of statistics.A lightweight rough estimator for IPv4 address is proposed to locate candidate access super points from a large amount of original traffic data,and thus greatly reduces the pressure of cardinality estimating.In this dissertation,it proves that the accuracy of using RE to screen out the access super points is more than 99.9%under the conventional parameters.Distance Recorder(DR)is an incremental available counter with optimal memory in the sliding time window.Based on the design of all the relevant details of the rough estimator and distance recorder,a real-time access super point detection algorithm under sliding time window,Sliding Rough and Linear Algorithm(SRLA),is proposed in this dissertation.SRLA uses RE as a screening tool to process network streams and generates a list of candidate access super points online.Because the initial screening process has greatly reduced the number of estimated objects,SRLA algorithm can use a high-precision estimator,i.e.the Linear Estimator(LE),which can estimate the cardinality of each candidate host accurately.Experiments show that the error rate of SRLA algorithm is close to that of existing algorithms for 10Gb/s and 40Gb/s high-speed networks under discrete time windows,but the cardinality estimating time of SRLA is only 1/520 of existing algorithms’ on average.The decrease in cardinality estimating time makes it possible for SRLA to run in sliding time windows that require frequent detection.By combining RE and LE with DR,SRLA algorithm realizes incremental updating of the main data structure in the sliding time window.SRLA algorithm can support real-time access super point detection with the sliding step of 1 second and the time window size of 300 seconds under the 10Gb/s and 40Gb/s traffic.To solve the problem of long state maintenance time,a fast state maintenance counter,Asynchronous Timestamp(AT),is proposed in this dissertation.AT takes up one bit more memory than DR.For sliding time windows with k time slices,the time complexity of maintaining AT state is only O(1/k),while the time complexity of maintaining DR is O(1).Based on AT,a new algorithm for estimating the cardinality of multi-hosts with low state maintenance time under the sliding time window,Virtual Asynchronous Timestamp Estimator(VATE),is proposed in this dissertation.Experiments show that the higher the number of AT,the higher the accuracy of VATE.When the number of AT reaches 232,VATE maintains all AT states in less than 2.6 milliseconds per time slice,less than 1/400 of the time required to maintain the same number of DR.Replacing DR in SRLA algorithm with AT can further reduce the running time of the SRLA algorithm.In a distributed environment,the aggregation of data takes up a large amount of bandwidth or causes traffic peak.To reduce the communication overhead in a distributed environment,a three-stage delegated distributed access super point detection algorithm,Rough Estimator based Asynchronous Distributed algorithm(READ),is proposed.READ algorithm uses RE to generate candidate access super points in a distributed environment.The READ algorithm reduces the communication overhead by transmitting only the data related to candidate access super points to the global server,instead of transmitting all master data structures.This dissertation proves that the error rate of the READ algorithm in the distributed environment is no higher than that in the aggregated detection data environment.Experiments show that READ algorithm can reduce the communication overhead by more than 95%for 10Gb/s and 40Gb/s high-speed network data.This dissertation makes a thorough and meticulous study on the high-speed network access super point detection algorithm from various aspects.Using the research results of this dissertation,real-time access super point detection in a high-speed network under sliding time windows and efficient access super point detection in a distributed environment can be realized. |