Font Size: a A A

Design And Implement Of HTTP Traffic System Based On Apache Spark

Posted on:2021-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:S Q SunFull Text:PDF
GTID:2428330632462630Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Hypertext transfer protocol(HTTP)intrusion has long been a major issue in network security,and the intrusion detection technology has also evolved from the rule-based approach to the approach based on machine learning.Due to the high cost,rule-based intrusion detection approach tends to be replaced by the approach based on machine learning.In recent years,the deep learning technology for detecting malicious network traffic is very popular,which can effectively detect intrusion attacks.However,the deep learning technology suffers from serious imbalance distribution of data.To solve the problem,the thesis proposes a new solution which combines feature extraction and cost function.The main research contents are as follows:The thesis proposes a character-level abstract traffic feature extraction approach based on URL field and POST field information in HTTP traffic.The approach first transforms HTTP traffic to character vector based on Spark,and then trains the autoencoder with unlabeled data to extract the HTTP traffic features which have clearer decision boundaries.Experiments show that this method is efficient and the features are more effective in traffic detection.In the thesis,because the Cross Entropy loss function tends to ignore the minority class in unbalanced traffic detection,we propose the HM-loss function.We design a coefficient for the loss function and when the prediction is correct,the weight coefficient could dynamically adjust the contribution of benign sample to the loss function.When the prediction is wrong,the contribution of samples to the loss function is kept unchanged.The experimental results show that this approach is more effective than others.This thesis designs and implements an HTTP traffic detection system based on Spark.The system realizes end-to-end automatic traffic detection without manual intervention.The detection system has the functions of traffic collection,feature extraction,detection and automatic storage.The average daily detection data volume of the system is 2.5t,and the above feature extraction method and loss function are applied.
Keywords/Search Tags:Traffic detection, Deep learning, Feature extraction, Loss function, Imbalanced data set
PDF Full Text Request
Related items