Font Size: a A A

Research And Implementation On The Key Technologies For Binary Private Protocol Reverse

Posted on:2019-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:X Y YanFull Text:PDF
GTID:2428330566470898Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The private protocol is a type of protocols whose technical specifications are not disclosed for some reasons such as commercial protection.Protocol reverse is the process of obtaining protocol specifications by monitoring and analyzing message sequences or instruction sequences of protocol entities without protocol specifications.Private protocol reverse has received more and more attention,playing an important role in wireless network confrontation,malware analysis,vulnerability mining and network management.The massive machine type of communication,represented by industrial control and automatic driving,has become one of the three major types of service in the future.In view of the special needs of machine-to-machine communication,the bit-oriented custom protocol is widely used in such service.Therefore,with the rapid development of the Internet of Things,the binary private protocol reverse has become one of the urgent problems in the field of network security.Binary protocol often occurs in the form of discrete message sequences.The flow attribute characteristics cannot be extracted through the protocol interaction process.At the same time,the binary protocol identification can only be based on message,which is more difficult than protocol identification based on the flows.In order to improve network transmission efficiency,binary protocol usually uses the custom character set to define fields in bits.Compared to the character-oriented protocol,it is particularly difficult to determine the boundaries of the fields without prior information such as delimiters.In addition,the format type of binary protocol message is not equivalent to the state type.The state type depends on the state-related field.Therefore,the binary protocol state machine inference needs to be based on the state-related field.It can be seen that the binary private protocol reverse is facing many unique problems,and it has become one of the challenges in the study of protocol reverse.This paper focuses on three parts: protocol identification,format specification extraction and behavior specification extraction.The main work and innovation are as follows:1.The dimension of binary protocol message is too high.The number of clusters and the cluster centers in traditional clustering algorithms are difficult to determine.Therefore,a clustering algorithm for binary protocol messages based on improved principal component analysis and density peaks clustering is proposed.We improve principal component analysis by determining the dimensionality based on information gain.The improved principal component analysis can remove redundant information and retain the characteristics of original data.Meanwhile,we improve density peaks clustering based on distance index weighting.The improved density peaks clustering can select cluster centers automatically and enhance the distinction between cluster centers and other messages effectively.By testing on three data sets consisting of AIS,ARP,DNS,ICMP,and SMB messages,our algorithm is effective on binary protocol messages clustering.The purity and F values are all above 80%.Compared with the classic clustering algorithms such as K-means and DBSCAN,the F value is increased averagely by about 10 percentage points.2.The definition of binary protocol field is flexible and the offset of that is difficult to determine accurately.Therefore,a novel algorithm based on the optimal path search is proposed to determine the boundaries of binary private protocol format keywords.We propose the iterative n-gram-position algorithm to extract the candidate boundaries of format keywords,which can solve the problem that n is difficult to determine in the n-gram algorithm and the candidate boundaries extraction of format keywords with fixed offset.The optimal path search algorithm is used to select the optimal boundaries from candidate boundaries.The branch metric of optimal path search algorithm is based on the hit ratio of frequent item boundaries and the left and right branch entropy.The constraint of the optimal path search algorithm is based on the difference of value change rate between keywords and non-keywords.By testing on AIS1,AIS18,ICMP00,ICMP03 and Net Bios,the F values of our algorithm are all above 83%.Compared with VDV(Variance of the Distribution of Variances)and Auto Re Engine,the F value is increased averagely by about 8 percentage points.3.The format type of binary protocol message is not equivalent to the state type.It is difficult to distinguish the binary protocol messages with different state types through clustering.Therefore,a state machine inference algorithm for binary private protocol based on state-related field is proposed.We propose a state-related field identification algorithm based on the longest common subsequence distance,which effectively characterizes the logic similarity of the protocol sessions.We propose an initial state machine construction algorithm based on adjacency list,which overcomes the traditional problem of constructing the initial state machine based on the APTA(Augmented Prefix Tree Acceptor),which is large-scale and over-operated.We propose an abnormal session removal method based on probability statistics and similar states combination method based on in-degree and out-degree,which can effectively reduce the size of protocol state machine.By testing on the TCP and SMB,our algorithm can effectively realize state machine inference for binary private protocol,and the precision and recall all reach more than 90%.4.The design and implementation of private protocol data intelligent analysis system.The system application requirements are analyzed,and the software architecture is designed.The system supports the import and export of database data,performance index statistics,graphical display of results,and menu-based user operations.The core algorithms are initially implemented,including protocol keyword feature extraction,protocol message clustering,protocol field format extraction and protocol state machine inference.At the same time,we perform functionality and performance tests on different modules of the system.
Keywords/Search Tags:Protocol Reverse, Binary Private Protocol, Protocol Message Clustering, Format Keyword Boundary Determination, Protocol State Machine Inference
PDF Full Text Request
Related items