Font Size: a A A

Network Data Classification Technology Using An Improved Decision Tree

Posted on:2022-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:ADJEI AMMAH SERWAAH HANNAHFull Text:PDF
GTID:2518306491992249Subject:network information processing
Abstract/Summary:PDF Full Text Request
Today's global internet workload is dominated by the World Wide Web clients and servers application protocol called the Hyper Text Transfer Protocol(HTTP).However,the growth of this protocol has posed a major challenge for classifying systems efficiently with high consideration in using minimum bandwidth for faster classification processes.Classifying network traffic is necessary and applied in many areas of networking such as intrusion detections,content filtering,search engine optimizations,and network structure optimization.There are several advantages to efficiently analyze and classify network traffics,the most common one is speed.To efficiently analyze,classify,and investigate the user behavior pattern of this traffic,a model has been developed that efficiently analyzes and at last,categorizes the HTTP data into its respective categories with high accuracy and low computational time.Through the analysis of data traffic,the statistics and distributions for higher-level quantities such as the frequency of the found domain names,and the labeling of each domain name into its respective category were determined.To get the desired objectives,methods such as;tokenization,preprocessing,cleaning,vectorizations,and labeling were implemented in this research work.During tokenizing,the data is split into smaller texts for data preprocessing and cleaning to be done.After the data has been successfully processed and cleaned,labeling of the data is done for training.The data is labeled so that several supervised algorithms can be used to train the data.Once the labeling is done,vectorization is done on the data,vectorization is needed because the data is plain text therefore vectorization is needed to convert it into digits for the algorithm to be used on it.Once the vectorization is successfully done,multinomial na?ve Bayes was used to train and test the data,and a support vector was also used to train the same data,and finally,the decision tree was employed to train the data.After data training and testing with these three algorithms,the decision tree had the highest accuracy of 89%,a lowest computational time of 0.0889 seconds with high precision,recall,and an f1-score of 81%.The results,therefore,show that the classifier adopted in this thesis can be used effectively and efficiently for the accurate classification of HTTP data.
Keywords/Search Tags:HTTP, URL, Network Traffic, Support Vector Machine, Multinomial Na?ve Bayes, Decision Tree
PDF Full Text Request
Related items