| Webpage fingerprinting,as an important technical means in network security management,can review user’s network access behavior without decrypting their encrypted data.Traditional webpage identification methods often can only recognize homepages of different websites,while relatively lacking in fine-grained identification of different webpages under the same website.Fine-grained webpage identification can further analyze user’s behavior purpose on the target website,which is of greater significance.Since webpages from the same website have very similar traffic traces and existing solutions are difficult to distinguish,the identification results are not satisfactory.In addition,it is challenging to identify single flows for webpages accessed through VPN proxies because all webpage flows are mapped to the same domain.To address the above issues,this thesis conducted research and work as follows:(1)For the encrypted traffic generated by directly accessing webpages,this thesis proposes a webpage fingerprint extraction and identification method based on session flow segmentation.The proposed method first defines the main flow and auxiliary flow of the webpage,then analyzes the interactive characteristics of the data flow,divides the main flow into multiple block intervals according to the differences on the time sequence,and uses these blocks to represent the differences of the webpage.23 features are designed for each block,and statistical features are designed for both the main and auxiliary flows.Finally,machine learning algorithms are used for classification and identification.The experimental results show that this method can achieve a recognition accuracy of 93.9%,which is better than existing webpage fingerprint identification methods,and has high computational efficiency.(2)For the encrypted traffic generated by accessing webpages through VPN proxies,this thesis proposes a webpage fingerprint extraction and identification method based on the deep learning DSFN(Deep Surage Fingerprinting Network)framework.The proposed DSFN framework is mainly divided into three modules: model input module,feature extraction module,and classification and identification module.In the model input module,by analyzing the TCP segmentation and TLS data encapsulation,the Surage concept is defined to extract length and time input data.The feature extraction module includes spatial and temporal feature extraction modules,and finally,a fully connected network is used for classification and identification.Two datasets were collected for independent experiments of this method.The verification results show that the accuracy of this method reaches 94.2% and 98.3% on the two datasets respectively,effectively solving the problems of single feature usage and low accuracy of fine-grained webpage identification in existing webpage fingerprint identification methods. |