| Anonymous communication is a technology that can hide the communication relationship between two parties by some methods,so that the attacker cannot directly obtain or infer the communication relationship of the two parties or each of them.It provides identity privacy services for normal users and also can be exploited by malicious users to hide traces to avoid tracking.It is of great significance to study anonymous communication detection technology in effectively combating the use of anonymity network crimes and improving anonymous communication.Tor is the most widely used anonymous communication tool.To combat traffic analysis attacks,Tor uses a variety of traffic obfuscation plug-ins.Obfs is one of Tor's obfuscation plugins,based on encryption and padding for the purpose of concealing traffic characteristics.Obfs4 uses an improved elliptic encryption algorithm with anti-static feature recognition and a random filling mechanism for anti-message length feature analysis to further improve protocol anonymity.The implenment of Obfs4 traffic detection in actual networks faces multiple challenges:1)Full randomness:Obfs4 follows a completely randomized design,uses random elliptical encryption and random filling.It is strong to defend static feature detection and packet length feature detection.2)Mass flow:The detection system needs to be able to handle massive data while satisfying high accuracy and real-time.3)Large amount of similar traffic:There are a lot of obfuscation protocols which is similar to Obfs4 and some normal traffic also have some randomness characteristics in actual networks.4)Contradiction between high-precision and real-time:With the improvement of accuracy in the detection algorithm,the time efficiency will decrease.It is extremely difficult to meet the needs of both real-time and accuracy.In order to deal with these challenges,this paper proposes an Obfs4 traffic detection method based on multi-level filtering,which combines dynamic and static features,to meet the needs of high-precision and real-time.The main work and contributions are as follows:(1)In order to resolve the contradiction between high precision and real-time,we propose a multi-stage filtering strategy with coarse-grained rapid filtering method and fine-grained accurate method.It can meet the resource consumption and time efficiency while ensuring high accuracy.(2)In order to resolve the anti-static feature detection of the random design in Obfs4 and the non-random features in a large amount of normal traffic,we propose a randomness detection method for Obfs4.The randomness detection is to test the randmoness of the payload after bitwise reorganization and adjust the threshold value of deviation function according to the result.At the same time,in order to control the resource consumption in actual networks,this paper compares the effect of different load lengths on the results and choose the optimal result.And finally we achieve the goal of improving the time efficiency and reducing the resource occupation.(3)In order to resolve the high false negative rate caused by a large number of interferential data sets,we analyze the timing characteristics of the Obfs4's handshake.We reorganize the handshake data packets of Obfs4 and classity different protocol accroding to the timing characteristics in Obfs4's handshake packet.At the same time,in order to handle the huge detected data in the actual networks,this paper analyzes Obfs4 users' behavior and adopts a hierarchical packet length filtering method,which eliminates nearly 90%of the interferential data.It can greatly control the false negative rate and improve the detection efficiency.(4)After a large number of sample data feature of correlation analysis and validity analysis,we choose 16 flow characteristics in four types including direction,length,variance and entropy of packet length.And we use 4397 positive examples and 5128 negative examples to optimize punitive coefficient,fragment size and model.And finally we determine the optimal characteristics and parameters in model.Our experiment shows that the value of accuracy rate of Obfs4's traffic detection is more than 99%with less than 8000 CPU cycles per second.This result can meet the need for time complexity and accuracy in actual networks. |