| The increasing use of unmanned aerial vehicles(UAVs)has the potential to threaten public safety and privacy.Therefore,there is an urgent need for an effective method to regulate UAVs.Understanding the meaning and format of UAV flight control commands by protocol reverse-engineering techniques is highly beneficial to UAV regulation.This paper mainly studies the reverse analysis of UAV control protocol from three stages: protocol clustering,protocol field format inference and protocol state machine inference.In this paper,the technical ideas of word segmentation,topic model,word vector model,sentence similarity calculation and new word discovery in the field of natural language processing are introduced into the reverse-engineering of UAV control protocol.In summary,this paper makes the following contributions:1)Protocol clustering.This paper proposes a keyword extraction method based on LDA topic model,which avoids the shortcomings of n-values in n-gram word segmentation that is not easy to determine;It proposes a protocol sequence similarity calculation method based on word2 vec model,which improves the semantic expression of one-hot encoding;For the problem that the K-means clustering algorithm cannot automatically determine the clustering number,the method of determining the clustering number based on CH Index index is adopted.2)Protocol field format inference.According to the relatively fixed feature of the format key in the UAV control protocol,and the values of the fields such as the payload field and the check field change frequently,a protocol field format inference method based on field internal and boundary information entropy contribution values is proposed.The construction of the UPGMA distance phylogenetic tree in the multi-sequence alignment algorithm is omitted,and the disadvantage of introducing more vacancies in the progressive comparison process is also avoided.3)Protocol state machine inference.According to the variation of the serial number field that monotonically increasing or decreasing,a method for identifying state sequence related fields of protocol sequence number based on time series autocorrelation coefficients is proposed,and the status of the protocol message is marked.This paper proposes a protocol state machine construction and simplification method based on adjacency list,which saves the construction of APTA augmented prefix tree and avoids the problem that state nodes are too large.4)Prototype system design implementation.The reverse analysis prototype system of UAV control protocol based on natural language processing is designed and implemented.In addition,the reverse analysis prototype system of UAV control protocol based on multi-sequence alignment is designed and implemented according to the traditional protocol reverse analysis technology.In this paper,the two UAVs real protocol data set are tested and validated.According to the inference results of the proposed scheme,the meanings and formats of the flight control instructions of the two UAVs are obtained,which proves the effectiveness of the research,and the results show that the proposed scheme is better than traditional solutions in the F1 value of the protocol clustering and the protocol specification inference. |