Font Size: a A A

Research On Key Technologies Of Machine Learning Based Traffic Identification

Posted on:2016-09-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:L Z PengFull Text:PDF
GTID:1108330479978851Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the explosive development of the Internet in the past two decades, new tech-nologies and applications have emerged continuously. The 21 st century has witnessed therise of new applications, such as P2 P, which have provided good experiences for Internetusers. However, these applications have also caused troubles, such as consuming largebandwidth and avoiding detection of malicious attacks. Therefore, the challenges of ac-curately identifying Internet application traffic and managing them effectively to ensurenetwork service quality and user security are presented. Traffic identification has emergedto solve such problems.As an important method of artificial intelligence, machine learning has been widelyapplied in traffic identification research. Owing to its intelligent ability, good general-ization, and high efficiency, machine learning has become the main research direction intraffic identification. However, several key problems are still being resolved for machinelearning based traffic identification.(i) Traffic identification is imbalanced. Various typesof Internet traffic are accompanied by highly imbalanced distributions, which challengestandard classification models and tend to identify a minor class instance as a major classinstance.(ii) The collection of accurate traffic data with application information is diffi-cult. Given that the traffic data collected from the Internet do not have original applicationinformation, such data cannot be effectively used to build identification models.(iii) Prob-lems of Internet traffic feature extraction and evaluation persist, including the search forthe most effective packet number and the effectiveness evaluation of various features.To address the aforementioned problems, this study examines the traffic identifica-tion by using machine learning techniques and by searching for a systemic solution fortraffic identification, including basic data collection, feature evaluation, and identificationmodels. The primary contributions of this study are the following:First, we propose a new imbalanced data gravitation based classification model(IDGC). The model enhances the imbalanced classification abilities of the basic datagravitation based classification model(DGC), which suffers from imbalanced classifi-cation tasks. Experimental results show that IDGC outperforms standard classificationmodels and other imbalanced methods that tackle numerous imbalanced tasks. An imbal-anced traffic identification model is designed by using IDGC. This model demonstrateshigh-quality performance compared with a number of standard classification models andimbalanced classification methods.Second, this study designs a highly efficient traffic identification model based onflexible neural trees(FNT). The model can achieve high identification accuracy, as wellas automatically select traffic features because of the automatic variable selection abilityof FNT.Third, this study presents a new method of collecting Internet traffic data with ac-curate background application information. This method marks the original applicationof information into each outgoing IP packet from an Internet user host. Therefore, eachtraffic sample collected on the network carries its application information as well. In thisway, the problem of collecting accurate traffic data is resolved effectively.Finally, this study evaluates the effectiveness of different early-stage packet numbersof Internet flows based on information theories and experimental methods. This studytheoretically and experimentally determines the most effective packet number, which isespecially important given that most researchers empirically test the early-stage packetnumber.
Keywords/Search Tags:Traffic Identification, Machine Learning, Feature Evaluation, Imbalanced Classification, Flexible Neural Trees, Data Gravitation Based Classification
PDF Full Text Request
Related items