Font Size: a A A

Research On Network Stream Data Preprocessing And IoT Device Identification System Based On Relational Database

Posted on:2021-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2438330623971705Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Network device classification is widely used in network management and cyberspace security.The premise of device classification is to correctly preprocess traffic data.Most of the existing preprocessing methods use programming languages to develop small tool software.The narrow application range is not generalized,cannot meet some specific processing requirements,and the source code is not disclosed.The application of network flow statistical features for network device classification and security management and control is often limited by the insufficient data sets in this field.Some excellent data sets do not fully disclose the specific data collection and processing procedures,such as network intrusion.Detection data set KDD CUP 99 and so on.Therefore,in practical engineering applications,there is an urgent need for a flexible and efficient processing method and means to construct a traffic statistics data set that meets its own needs.The network flow data preprocessing method based on relational database proposed in this paper can effectively complete the extraction of traffic statistical features.Compared with existing technical methods,it has the characteristics of fast extraction,high degree of automation,and strong versatility.Using this method can be flexible the statistical characteristics of traffic under different classification requirements are constructed,which greatly improves the preprocessing efficiency of the original network flow data.The main work of this article is as follows:(1)A general network traffic data set construction method is proposed.This method is based on the SQL script language of a relational database.It has the advantages of batch processing,flexible coding,and high degree of feature customization.The common traffic statistics characteristics are abstracted and summarized,and the code modules corresponding to the statistical characteristics are mapped out in a modular corresponding manner by using the SQL statistical language,and a feature extraction library is further formed.Versatile processing power.(2)The statistical feature extraction experiment is selected to be performed on the public pcap data set,and the extraction algorithm is implemented for the statistical feature set selected by the demand.The experimental results show that the specified flow statistical features are completely and accurately extracted using the preprocessing method proposed in this paper,and the processing conversion from the pcap binary format flow file to the CSV format statistical feature file required by the machine learning model is completed.(3)In order to verify the correctness and effectiveness of the statistical feature data extracted by this method,the generated CSV statistical feature data is input into multiple classic machine learning models to classify and predict the IoT devices.The experimental results show that the accuracy of the training set generated by the method in this paper is not lower than that of the existing similar research literature.Since the compared models are classic models and most of them use default hyperparameters,it means that under the same model structure the training data set generated by this method can make the model have better classification effect.(4)Based on the network traffic statistical feature extraction method proposed in this paper,an IoT device identification system architecture is designed and implemented.The client uses Wireshark as the traffic data collection service,and uses Mysql's stored procedures to achieve regular statistical dumps of raw traffic data.Combined with the bat batch processing command under Windows system,it realizes the automatic acquisition of raw traffic data to the generation of statistical data to be predicted.The Postman tool was used to simulate the device identification process from the HTTP client access request to the prediction result return.
Keywords/Search Tags:Preprocessing, Statistical Features, Device traffic classification, Database, SQL, Machine learning
PDF Full Text Request
Related items