Font Size: a A A

Research On Key Technologies Of Protocol Message Formats Extraction

Posted on:2021-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:X LuoFull Text:PDF
GTID:2518306548995389Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Protocol reverse engineering is a key but nontrivial technology in the field of network engineering and cyberspace security.Protocol reverse engineering aims at extracting message formats and protocol state machines from nonpublic protocol executable and their communication data.Among protocol specifications,it is an important goal as well as key step to extract correct message formats,which decides the accuracy of state machines' extraction.This thesis explores key technologies in message formats extraction.The study tries to improve both network-trace-based and execution-trace-based methods for message format extraction.The proposed methods could be extended to extract protocol state machines.The study is promising to be applied into protocol reusing,protocol executable evaluation,network intrusion detection,etc.This study tries to improve current methods for message format extraction,especially in message field identification,relations extraction among fields,semantic information extraction,etc.The main research work is summarized as follows:First,the study proposes to extract message formats based on topic generative models.Many existing approaches to message clustering use similarity scores based on sequence alignment as their distance metric,ignoring the use of type information.Type information can effectively guide message clustering.Towards this,this thesis employs topic generative models to characterize the relationship between message types and message data.It extracts message type information by inferring parameters for the model.Message type information is used as distance metric to guide message clustering.The study also uses byte boundary entropy to identify delimiters and keywords.Then,it uses segment-based alignment algorithm to extract message fields.The proposed method takes full advantages of type information hidden in message data.It improves existing methods by avoiding parameter setting and correctly handling varied-length fields.Second,the study proposes to extract message formats based on similarity of execution sequences and the analysis of basic block structure.Typical methods for message formats extraction from execution trace rely much on experience on protocol engineering.Most of these methods fail to extract the structural information among message fields.Towards this,this thesis proposes to extract message formats based on similarity of execution sequences and the analysis of basic block structure.By comparing the similarity of instruction sequences and function sequences which handle message data,the method tries to identify bytes that belong to the same field.It uses dynamic binary analysis based on basic block structure to extract hierarchical message fields.Then,the study merges different-grained results and adds semantic information to form the output message formats.It improves existing methods by avoiding heavy reliance on protocol engineering experience and adding useful structural information to message formats.Third,the study proposes to extract message formats from both network trace and execution trace.Current network-trace-based methods are not accurate enough and lack semantic information,whereas execution-trace-based methods can only extract one type of message formats during a single run.Towards this,this thesis proposes to extract message formats from both network trace and execution trace.The proposed method employs dynamic binary analysis platforms to record execution trace and its corresponding network trace.Network-trace-based methods are used to cluster the recorded messages,based on which execution-trace-based methods can extract different types of message formats.In addition,the method utilizes execution-trace-based methods to extract semantic information.Such information is used to abstract original messages to form their semantic representations.By applying sequence alignment algorithm onto these semantic message representations,message formats are extracted.This thesis implements an original system called NEPRE,which extracts message formats from both network trace and execution trace.The system correspondingly records network trace and execution trace based on PANDA analysis system.Generally,the extracted message formats are correct.The implemented system is evaluated and compared with Wireshark,a typical protocol analyzer.The implemented system can identify more fields used in applications.The results is more applicable in the field of protocol reusing,protocol executable fuzzing test,etc.
Keywords/Search Tags:Protocol Reverse Engineering, Message Formats Extraction, Dynamic Binary Analysis
PDF Full Text Request
Related items