Font Size: a A A

Obfs4 Traffic Identification Based On Multi-source Fusion

Posted on:2021-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:D LiangFull Text:PDF
GTID:2428330614470853Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Traffic obfuscation protocol Obfs4 is the main mechanism used by anonymous communication tool Tor to resist traffic detection to improve network availability.In the real word,the following problems exist in obfs4 traffic identification: 1)Obfs4 has strong anti-detection capabilities: Obfs4 protocol adopts an improved elliptic curve encryption algorithm to encrypt data packets,so both the packet header and the load don't have any valid plaintext information.It can resist all traffic identification algorithms based on packet plaintext information.Obfs4 also uses random filling for the packet load part,so that the distribution of packet length is no longer regular,which can resist most of the identification algorithms that use packet length as a dynamic feature.2)Interference with a large number of similar traffic: In the real world,there are massive traffic,including a large amount of obfuscated traffic that uses a protocol similar to Obfs4,so it is particularly difficult to find features that can clearly distinguish similar traffic from Obfs4 traffic.3)High requirements for precision: In the real world,the number of non-Obfs4 traffic is much larger than the number of Obfs4 traffic,so if the precision of the algorithm is not high enough,it will cause a large amount of false detection data in the detection result,which will affect the validity of the test results.Therefore,it's very difficult to design an algorithm that can effectively identify Obfs4 traffic.In order to meet the above challenges,this paper proposes a traffic identification algorithm for Obfs4 traffic based on the idea of multi-source fusion.The main work and contributions are: 1)Proposes a selection and fusion method of Obfs4 multi-source features,which use multiple strategies to obtain Obfs4's node characteristics,active connection response characteristics,handshake packet length characteristics,ordered response characteristics,randomness characteristics,communication data packet entropy value characteristics,and use representation learning for Word Embedding.And this paper proposes two multi-source feature fusion methods: weight calculation after serial fusion and fusion based on kernel canonical correlation analysis.2)Proposes a traffic identification algorithm for Obfs4,which consists of four parts: preprocessing,feature extraction,multi-level filtering,and machine learning classification.The data to be detected after preprocessing module and the feature extraction module enter the multilevel filtering module for coarse filtering,and then enter the machine learning classification module for fine classification.3)Proposes a support vector machine algorithm based on weighted Gaussian kernel function and a weight calculation method based on entropy,the algorithm is optimized for Obfs4 multi-source features.After experiments,the accuracy of this algorithm for Obfs4 traffic identification can reach 96.68%,which verifies the effectiveness of this algorithm for Obfs4 traffic identification,and also shows that the anonymity of Obfs4 is flawed,and the attacker can attack the traffic of Obfs4 effectively according to the fingerprints it leaks.
Keywords/Search Tags:Obfs4, multi-source fusion, representation learning, traffic identification
PDF Full Text Request
Related items