Network Data Classification Technology Using An Improved Decision Tree

Posted on:2022-01-24

Degree:Master

Type:Thesis

Institution:University

Candidate:ADJEI AMMAH SERWAAH HANNAH

Full Text:PDF

GTID:2518306491992249

Subject:network information processing

Abstract/Summary:

PDF Full Text Request

Today’s global internet workload is dominated by the World Wide Web clients and servers application protocol called the Hyper Text Transfer Protocol(HTTP).However,the growth of this protocol has posed a major challenge for classifying systems efficiently with high consideration in using minimum bandwidth for faster classification processes.Classifying network traffic is necessary and applied in many areas of networking such as intrusion detections,content filtering,search engine optimizations,and network structure optimization.There are several advantages to efficiently analyze and classify network traffics,the most common one is speed.To efficiently analyze,classify,and investigate the user behavior pattern of this traffic,a model has been developed that efficiently analyzes and at last,categorizes the HTTP data into its respective categories with high accuracy and low computational time.Through the analysis of data traffic,the statistics and distributions for higher-level quantities such as the frequency of the found domain names,and the labeling of each domain name into its respective category were determined.To get the desired objectives,methods such as;tokenization,preprocessing,cleaning,vectorizations,and labeling were implemented in this research work.During tokenizing,the data is split into smaller texts for data preprocessing and cleaning to be done.After the data has been successfully processed and cleaned,labeling of the data is done for training.The data is labeled so that several supervised algorithms can be used to train the data.Once the labeling is done,vectorization is done on the data,vectorization is needed because the data is plain text therefore vectorization is needed to convert it into digits for the algorithm to be used on it.Once the vectorization is successfully done,multinomial na?ve Bayes was used to train and test the data,and a support vector was also used to train the same data,and finally,the decision tree was employed to train the data.After data training and testing with these three algorithms,the decision tree had the highest accuracy of 89%,a lowest computational time of 0.0889 seconds with high precision,recall,and an f1-score of 81%.The results,therefore,show that the classifier adopted in this thesis can be used effectively and efficiently for the accurate classification of HTTP data.

Keywords/Search Tags:

HTTP, URL, Network Traffic, Support Vector Machine, Multinomial Na?ve Bayes, Decision Tree

PDF Full Text Request

Related items

1	Research On Network Traffic Classification Based On Machine Learning
2	Analysis And Application Of Telecommunications Data Based On Support Vector Machine And Decision Tree
3	Using K-Mean And SVM To Build Hybrid Methodology To Classify Diseases
4	Research On Key Identification Method Of P2P Traffic
5	Classification Based On Influence Functions
6	The Application Of Svm To Decision Tree Induction
7	The Application Of SVM To Decision Tree Induction
8	Multi-class Classification Algorithm Based On Decision Tree Twin Support Vector Machine
9	Pig Posture Classification And Abnormal Behavior Analysis Based On Decision Tree Support Vector Machine
10	Research On APT Attack Detection Technology Based On Traffic Analysis