| With the rapid development of computer network,it has gradually become a part of people's lives,and a variety of network applications has became the indispensable tool for many people.Network protocol plays an important role in computer network,and it plays an important role in connecting the two sides of the communication.However,the emergence of more and more private network protocols also brings various challenges to people.Most malicious programs communicate with unknown private protocols.Because it is difficult to know the specifications of these protocols,so analyzing these malicious software and developing security protection mechanisms become a difficult event.Therefore,in-depth understanding of the network protocol has an important means to maintain network security.At present,Network protocol reverse has become a hot issue in network security research,and it has important research significance.Network protocol model inference mainly includes two aspects: protocol message field analysis and protocol state machine inference.Protocol message field analysis refers to the analysis of the structure of the message by certain means,and excavates the structural features of the text segment.The protocol state machine inference is to obtain the temporal logical relation of protocol message,and get the behavior characteristics of protocol.The main work of this paper is to obtain the message format and semantic information by using the method of dynamic binary analysis,and combined with the semantic t protocol state machine,which effectively improves the method now due to lack of semantic message and reasoning problems led to inaccurate results.The work includes the following two aspects:A method of message format extracting is proposed,based on program instruction traces,for the existing methods need a large number of samples and lack semantic information,the main idea is using the method of dynamic binary analysis to obtain instruction execution path,so as to get the message format partition information.This process makes use of DECAF to monitor the application of the network protocol,and get the application execution path based on the analysis of the use of taint propagation,and then,according to some characteristics of the instructions,combined with a certain protocol field resolution strategy,to get the method of protocol field division.The hook mechanism of the dynamic binary analysis platform is also used to monitor the execution of the program API,and then the semantic information of the message can be obtained,as an important means to improve the accuracy of state machine inference.For the existing protocol state machine inference methods lacking of semantic information and the realization process is complex,we presents a protocol state machine inference method based on grammar inference technology combined with the message semantic information,the main idea is from the network message sequence derived protocol temporal logic,using the grammar inference technology temporal logic is expressed as deterministic finite state machine(DFA)model.The protocol state machine inference is based on the message format and semantics.We first use the sequence alignment algorithm in biological engineering to calculate the distance between the messages,and then we use the UPGMA algorithm to cluster messages,the specific message composed into session abstraction message type session.Then we use the heuristic state labeling algorithm,combined with the semantic information message to mark the state more precise.Finally,EDSM algorithm is used to simplify the state,so as to get the mininal protocol state machine.At last,prototype system of protocol model inference based on DECAF platform is designed and implemented,and this paper selects the TCP protocol,SMB protocol,Agobot protocol to test the system of Network protocol model inference,the experimental results show that this system has good results. |