| In recent years,a large number of Io T devices are used in smart home,smart factory,smart medical and other scenarios.The number of Io T devices accessed in the network is growing exponentially.While enjoying the convenience of the Io T era,how to identify and manage Io T devices is a huge challenge we face.In this thesis,we construct the fingerprints of devices from the traffic of Io T devices.In this thesis,two parts of work are done to fully consider the requirements of real-time identification in the process of Io T device identification,whether it can cope with the addition of new devices,device privacy protection,and the difficulty of fingerprint feature acquisition.The details are as follows.(1)In this thesis,we propose a method to improve real-time device identification based on traffic statistical features.The method obtains a total of seven basic data about packet size,packet interval time,transport layer protocol,packet TCP source and destination ports,and UDP source and destination ports from the traffic of Io T devices.Secondly,a fingerprint containing 77 features is constructed for each device by using 25 consecutive packets sent from the device side to find its average,absolute energy value,skewness,kurtosis and other 11 features for these 7 basic data respectively.Finally,supervised learning algorithms plain Bayesian,support vector machine,random forest,logistic regression,and k-nearest neighbor are used to train the recognition models,respectively.The results show that the recognition model trained by the random forest algorithm has the best identification results.The classifier trained by the random forest algorithm achieves 99.8% recognition accuracy,recall,and F1 score on 16 devices on the public dataset(UNSW).The identification accuracy,recall,and F1 scores on 13 devices in the self-collected dataset reach 99.6%.Compared with the existing methods,the proposed method can effectively reduce the recognition latency from 1800 seconds to 347 seconds,while the recognition accuracy reaches the level of existing methods.(2)Facing the problem of unknown device identification,a templatebased approach and a self-encoder-based solution are proposed in this thesis.The template-based approach uses the proportion of different protocol packets occupied by a device over a period of time to build a template.The method is easy to implement,but it does not identify well for similar devices.Both methods compare the devices to be identified with the devices in the fingerprint library by comparing them,and if the difference is all greater than a threshold value then the device is determined to be an unknown new device.Considering the sample size,12 devices were selected as known devices and 4 as new devices in the UNSW dataset.In the OWN dataset 12 were selected as known devices and 1 as a new device.The identification accuracy,recall,and F1 scores of the templatebased method in UNSW are 91.2%,90.2%,and 90.2%.The identification accuracy,recall,and F1 in the OWN dataset with similar devices are 59.2%,58.6%,and 52.9%.The identification accuracy,recall,and F1 scores of the self-encoder-based method on the open data UNSW are 97.6%,93.8%,and95.1%.The identification accuracy,recall,and F1 scores on its own dataset OWN are 98.0%,94.9%,and 96.0%,respectively. |