Font Size: a A A

Research On Messages Generation Meyhod For Unknown Network Protocols Based On Seq2seq

Posted on:2021-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y N XinFull Text:PDF
GTID:2428330614450024Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
As an important foundation for the research of network security technology,the protocol is applied to many scenarios such as network intrusion detection and fuzzy testing.Among them,intrusion detection system needs to use a large number of private protocols.However,without protocol message documents,protocol reverse engineering cannot completely analyze the protocol message content,so it cannot generate complete messages.This paper proposes a method of generating messages for unknown protocols.This method can complete the generation of messages without prior information of message fields.The research method of unknown network protocol message generation message generation is to make use of the model that neural network can generate text to modify so that it can achieve the purpose of predicting the generated message by learning the obtained content information of traffic message.The research object of this model is plaintext message data of unknown network protocols,which refer to protocols with unknown protocol characteristics.Therefore,this model does not need to use its prior knowledge when generating various protocol messages.Message generation for unknown protocols is completed in two stages:message segmentation and generation.Message header segmentation stage:according to different header types of message protocols,it is divided into two types:protocols with fixed message header format and protocols with non-fixed message header format.According to these two types of protocols,corresponding methods are proposed to separate the header content from the data content.So that the subsequently generated protocol message data is more accurate.In the segmentation stage of protocol message content,the type of the message is first determined,and then the corresponding method is adopted for segmentation according to its type.Among them,the segmentation method of fixed header format uses two numerical values of information entropy and mutual information between each field in the message and adjacent fields behind it to determine the location of its interception,while the segmentation method of non-fixed header type protocol uses the principle of key word frequency to determine the segmentation location.Experiments are carried out on TCP protocol and SMTP protocol to verify the above-mentioned header segmentation method,wherein TCP protocol belongs to a fixed-length header format protocol and is accurately segmented to the header length;SMTP protocol has no fixed-length header format,and redundant load is removed through segmentation.By setting up segmentation experiments with different data amounts,we can see that when the data amount reaches a certain level,the amount of data is not the main factor affecting the results.Message generation stage:this paper uses Seq2seq(Sequence-to-sequence)model to build a network protocol generation model.by dividing the obtained protocol message data into training set and test set,the two data sets are respectively read into the model to learn and train the data sets and predict the generation of message data of peer protocols.In the experimental stage,message generation is carried out for TCP protocol,SMTP protocol and HTTP protocol,and the generation capability of the model is evaluated.At the same time,the influence of data amount on message generation is considered.In the experimental process,each protocol adopts messages without data amount for training,and generates the same number of messages.Encapsulate the generated message to include physical layer,data link layer and other information,and analyze and identify by Wireshark to obtain the protocol type of the generated message and the identifiable condition of its content.The final TCP protocol has an accuracy of 70%and a recognition rate of 100%.The accuracy of SMTP protocol is 85%and the recognition rate is 85%.The accuracy rate of generating words for HTTP protocol is 91%,and the recognition rate is 95%.
Keywords/Search Tags:network protocol, Information content, keyword frequency, Seq2seq model
PDF Full Text Request
Related items