Different kinds of Network Security Appliances(NSA) are installed in the gateway to ensure the safety of the enterprise’s intranet. These security devices,such as firewalls and Instruction Detection Systems(IDS), will generate large volume of logs to record network events.Security experts can extract network security incidents that reflect the network security situation by analyzing these multi-source and heterogeneous logs.Based on it, network managers understand the network situation accurately and make right decisions. The logs of the security devices have the following characteristics: magnanimity,multi-source heterogeneity, spatial-temporal correlation. This thesis mainly focuses on the fusion algorithm of security device logs. The proposed algorithm is built on the top of Hadoop, a big data processing platform, consisting of the log preprocessing, the log clustering and the log fusion, by analyzing and improving existing algorithms.Firstly,this thesisanalyzes the related concepts of the logs produced by network security devices, and then introduces some existing log analysis tools briefly.To dig out the hidden information of network situation,this thesis analyzes the situation awareness model in terms of information acquisition, situation features extraction and situation assessment. To process massive multi-source logs efficiently, this thesis dividesoriginal security devices’ logs into three standard logs that are management logs,abnormal trafficlogs, network attack logs.Secondly, this thesis proposes a clustering algorithm based on the dissimilarity of the logs generated by network security devices. The proposed algorithm is on basis of the features of logs that corresponds to the network attack model. Besides, in this algorithm, the conditions for log fusion depend on dynamic time threshold. Furthermore, the algorithm assigns different weights to the importance of different network attacks, according to IP addresses, port numbers etc. Experiments demonstrate that this clustering algorithm has a fairly high detection ratio. The clustered log is termed by Hyper_logs. Afterwards, this thesis proposes a rule-based and weighted-DS-theory method to fuse Hyper_logs. The method handles the Hyper-logs and then assigns different weights according to the differences of the detection ratios for network attacks, leading a better reflection of practical network attack scenarios. Finally, after logs fusion completes, this thesis designs and implements a network situation awareness system, which is based on the Tim Bass model. The system consists of three modules: logs collection, data processing, and situation assessment. Besides, the system provides interfaces for network security managers to select and query. The system also benefits managers to acquire the information of network situations, and then provides references to make further network decisions. |