With the popularity of smart devices,the number of mobile applications has grown exponentially,and the traffic generated by mobile applications has become the main traffic in the Internet.The security and management of mobile application traffic has become a concern for network managers.Studying mobile application traffic identification technology can help managers understand and analyze network conditions,so as to formulate security policies and network management schemes.The current mainstream mobile application traffic identification methods are based on the statistical characteristics of traffic.But these methods cannot realize early identification of traffic.And mobile applications generally use public third-party libraries,which will generate similar traffic and bring huge challenges to mobile application traffic identification.In response to the above problems,this thesis aims to identify the third-party library traffic in the mobile application traffic and identify the traffic generated by different applications in the early stage of traffic transmission.The research of this thesis has achieved the following results:1.Proposed a third-party library traffic identification method based on domain feature clustering.The method extracts the domain of traffic from the DNS packet,extracts the Host field of the HTTP header and extracts the server name field of the "Client Hello" packet which in the HTTPS TLS handshake phase.Then clustering traffic based on domain similarity.Combining the characteristics of third-party library traffic can generated by multiple different applications,the method could identify traffic of third-party library according to the number of application tags in the clustered set.The Experiment Result shows that this method can effectively identify third-party library traffic in application traffic,with an accuracy rate of 87.5%and a recall rate of 90.8%.2.Proposed a random forest traffic identification method.The method can identify application traffic in the early stage of traffic transmission based on packet characteristics.Considering the problem that the method based on statistical features needs to observe the entire flow and cannot be identified in the early stage of traffic transmission,the method in this thesis uses the length sequence and TCP window size sequence of first 10 packets at the beginning of the traffic,and can identify mobile applications in the early stage of traffic transmission.Experiments show that the method can accurately identify the traffic of the mobile application using the features of the first 10 packets,with an accuracy rate of 93.5%.3.According to the third-party library traffic identification method and the random forest application traffic identification method based on data packet characteristics,the thesis designs and implements a real-time classification and identification system of mobile application traffic.The functions of this system include traffic collection,traffic processing and traffic identification.After the system is deployed,it can collect the application traffic in the network in real time,analyze the traffic efficiently,and finally identify the third-party library traffic in the traffic and identify the traffic of different applications,the accuracy can reach 87.6%. |