Font Size: a A A

Research On Unknown Protocol Recognition Technology Based On Data Mining

Posted on:2020-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:T WangFull Text:PDF
GTID:2428330596975508Subject:Cryptography
Abstract/Summary:PDF Full Text Request
With the current communication and network technology environment,network target recognition has urgent practical needs,for the security maintenance of communication networks and signal reconnaissance in electronic countermeasures.Especially for non-public network protocols used in military or some private networks,analyzing and identifying unknown protocols in private networks is of great research significance.The data intercepted on the data link layer usually exists in the form of a bit stream,and it is difficult to extract the characteristics of the unknown protocol from the bit stream data without any semantics.This thesis studies the unknown protocol frame structure identification on the data link layer.At present,research on the identification of unknown protocols for bit streams at home and abroad is progressing slowly.In the existing bit-stream-oriented unknown protocol identification technology,pattern matching and data mining techniques are usually used to identify unknown protocols.Based on the idea of data mining,this thesis attempts to mine the bit sequence with special significance from the intercepted large number of unknown protocol bit-stream data,so as to further speculate the characteristics and frame structure of the unknown protocol.Firstly,this thesis investigates the traditional bit stream-based unknown protocol identification scheme,including AC(Aho-Corasick)fast counting,frequent sequence filtering,long sequence splicing and sequence association rule analyzing.Through simulation experiments,it is found that the storage structure of AC fast counting algorithm is complex and the complexity of long sequence stitching algorithm is also too large in the traditional scheme.Secondly,aiming at the above defects in the traditional unknown protocol identification scheme,an improved unknown-feature sequence mining algorithm is proposed.The storage structure of AC fast counting is optimized,and its correlation with N-gram algorithm is analyzed.Aiming at the problem of high complexity of traditional long-sequence splicing,a low-complexity long-sequence splicing algorithm based on position difference and a long-sequence splicing algorithm based on continuous position are proposed.The experimental results show that the proposed two new long sequence splicing algorithms can effectively reduce the time complexity.In addition,due to the existence of a large number of sequences with inclusive relationships in the frequent long sequence sets obtained after splicing,a seed sequence merging algorithm is proposed in this thesis.The experimental results show that the sub-sequence merging algorithm achieves a reduction efficiency of 81% for candidate feature sequences.Finally,using the mined candidate feature sequence set of unknown protocol,this thesis attempts to analyze the frame characteristics of the unknown protocol.The frame length calculation algorithm for fixed-length frame protocol and the frame segmentation algorithm for non-fixed-length frame protocol,and the frame structure estimation scheme for fixed-length frame protocol and non-fixed-length frame protocol are proposed respectively.The experimental results show that the proposed new scheme can effectively determine the characteristics and frame structure of the unknown protocol.
Keywords/Search Tags:unknown protocol, data mining, feature sequence, frame characteristics analyzing
PDF Full Text Request
Related items