| Network security has become an important research area due to the ever-growing number of attacks on the network.With the increasing nature of online transactions and e-commerce activities powered by the Internet,attackers have recently focused on attacking the network for malicious activities.Since attackers keep evolving with different approaches targeted at attacking the network,there is every need for the research community to keep proposing and developing new approaches well suited to identify and classify network traffic for intrusion detection.To classify malicious traffic(malware)from normal traffic(non-malicious),packet-based or flow-based approaches are usually deployed.Packet-based approaches usually consider all the packet’s payloads from network data to identify and classify traffic for intrusion detection.In contrast,flow-based approaches summarize the network data statistics as flows to classify traffic for intrusion detection.Using the flow-based approach is more helpful in terms of reducing computational complexity and achieving higher accuracy.Hence,this thesis is limited to the flowbased approach adopted to classify malicious traffic from non-malicious traffic.The network traffic features are optimized for machine learning methods.The proposed methods are tested on three real-world datasets,and the results are presented in terms of Accuracy,F1-Measure,Precision,Recall,and Confusion Matrix.Firstly,through experiments on three realworld datasets,flow-based features were adopted for network traffic classification using the Random Forest,Decision Tree,and Na(?)ve Bayes classifiers.Furthermore,a new metric “additive flow size” was proposed and used for classification using a hybrid model incorporating the Logistic Regression and Decision Tree to build a robust classifier.Secondly,the proposed new metric is used with Convolutional Neural Network(CNN)to classify malicious traffic from nonmalicious traffic for intrusion detection.The proposed models on the original network traffic flows and proposed new metric all performed reasonably well,achieving an accuracy of over 92% in all cases.In summary,the major contributions of this thesis are as follows:1.The thesis investigates the significant effect of parameter optimization in the context of network traffic classification using Machine Learning.Features were studied,and their relevancy to the learning algorithms was determined using two combined algorithms(forward selection and backward elimination)of the wrapper feature reduction techniques.Moreover,a new metric “additive flow size” is proposed to be used as features fed into a hybrid model incorporating the Logistic Regression and Decision Tree for Transmission Control Protocol(TCP)/User Datagram Protocol(UDP)normal and malicious traffic classification for intrusion detection.2.The attention of CNN has been extended to network traffic classification due to its remarkable performance in pattern recognition and image classification.CNN is the most recently used technology for network traffic classification.Hence,the proposed additive flow size metric is used with CNN for TCP/UDP normal and malicious traffic classification for intrusion detection.The thesis proposed a Flow-Based Additive Network Traffic Classification System(FANTCS)model that uses three-tuple(packets,bytes,and class target)for effective and efficient network traffic classification.3.In order to check the flexibility of the proposed methods in real-life applications,this thesis presents a prototype system design and implementation using the Hypertext Markup Language(HTML)tags to create the user interface,Flask-Python framework to link the existing codes with the interface,and Java Script to output the accuracy and confusion matrix.With this prototype system,the basic demonstration of works carried out in the proposed methods is presented. |