| In this science and technology development, the rapid flow of information society. Netflow, as a kind of data exchange method proposed by CISCO, plays an important role in the monitoring network. There are a lot of researches about Netflow at home and abroad. But with the development of network, the traditional anomaly detection technology based on "flow" has the following problems: Heavy demand for training sample; Complex feature extraction; Low detection rate of unknown anomaly. And in the face of massive Netflow streaming data, the current research has not found a good strategy.Based on the above problems, Combining hidden Markov models with large data, We propose a method for detecting abnormal traffic flow of Netflow based on HMM. The research of this paper mainly includes three modules: data acquisition and processing, modeling and anomaly detection. First, in the data acquisition and processing module, We use the distributed log acquisition system(Flume) to collect the Netflow flow to the distributed file system. Secondly, in the modeling module, We classify the Netflow data according to ICMP, TCP and UDP based on the high standardization protocol. And then Sample stream data for each protocol use to reduce dimension, quantify and other preprocessing. At last,we establish the HMM model of each protocol for normal flow data.Finally anomaly detection module: According to the established model, to calculate the observation probability in a distributed manner. Comparison of probability and threshold, to detection of abnormal netflow traffic.Through the research of this paper, Using hidden Markov model as the basic algorithm model for anomaly detection, not only reduce the sample demand,but also reduce the complexity of feature extraction and anomaly detection, and when in the face of an unknown anomaly, will not be helpless. Last but not least, this paper use the distributed storage and distributed computing framework to help us to provide a way of solving the abnormal flow data detection of massive data. |