Font Size: a A A

Research On Online Recognition Method For Network Application Flow Based On Spark Streaming Processing

Posted on:2018-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:X H HuangFull Text:PDF
GTID:2428330569498669Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the era of big date's coming and the network scale's gradual expanding as well as the bearer business' multiple increasing,the network traffic has grown rapidly at an alarming rate.Thus,how to identify and classify the network application traffic quickly and accurately has become a research problem that need to be solved.Meanwhile,the traditional off-line analyzing method is always employed in the single-machine environment under limited computing resource which result in the low analyzing efficiency and can not realize the real-time network application traffic identification and classification.Based on the current prevalent big date analysis platform-Spark as well as combined the Spark Streaming data stream processing framework of the key sub-framework,the Kafka information system which support the distributed system and the high availability and the MLlib extensible machine learning algorithm library,this paper proposes a method which can realize the on-line network application traffic identification under the large-scale network traffic background and further designs and realizes the real-time network application business traffic identification system on the basis of Spark stream processing.The real-time network application business traffic identification system based on the Spark stream processing is in the light of machine learning algorithm library to identify the application traffic in the network.Due to the natural imbalanced feature of application traffic in the network,there will emerge the imbalanced date classification problem during the classifying process.This paper aimed at this problem to propose the following solutions respectively from the data level and the algorithm level:On the date level,this paper firstly analyzes the disadvantages of the SMOTE which leads to the decreasing of the new date's quality because of lacking consideration of the actual distributed features of the minority class date as well as the distributed situation of its surrounding majority class date.The modified NF-SMOTE improved the SMOTE's blindness,which gives full consideration to the distributed feature of data.The results of the experiment show that the NF-SMOTE can rationally generate the minority samples and improve the classification accuracy.On the algorithm level,this paper proposes an integration approach based on the application business with the ensemble learning as the core.Setting the application as precondition,this algorithm independently trains the classifier for the single minority and builds an ensemble classifier with the combination of the traditional classification algorithm.The results of the experiment show that this approach can significantly promote the classification accuracy of the application minority.
Keywords/Search Tags:Traffic Identification, Machine Learning, Imbalanced Classification Problems, Spark
PDF Full Text Request
Related items