Font Size: a A A

C4.5 Decision Tree Based On Hadoop And Its Application In Network Traffic

Posted on:2017-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:C Z WangFull Text:PDF
GTID:2348330533450153Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Network traffic classification is an effective solution to maintain the efficient operation of entire network, which can be conducted in network monitoring, network planning and network security and other areas. Existing researches on network traffic classification have made a lot of achievements and been widely applied to various field, thereby promoting social development and facilitating people's lives. As a mainstream network traffic classification method, classification method based on machine learning is widely used. Due to the rise of internet and mobile internet, traditional machine learning algorithms can not accommodate exponential increase of volume of data network traffic, and thus face severe challenges inters of storage and computing. Continuous development of cloud computing and big data technology makes parallelization of traditional machine learning algorithm becoming possible. Hadoop is the most commonly used parallel framework, which is responsible for storing HDFS, MapReduce responsible for the calculation.The main work is as follows, in order to meet current needs, through a detailed study of the traditional C4.5 decision tree classification algorithm are found in the process of building a decision tree to calculate the rate of information gain extensive use of the calculation, which affect computing time to build a decision tree, which simplifies the calculation formula introduce McLaughlin information gain ratio, making the information gain rate into the basic arithmetic operations, improve the efficiency of the calculation. In the original data pre-processing using WEKA software and algorithms of FCBF discretization and feature selection data; important component of HDFS Hadoop and MapReduce to study the internal mechanism of the C4.5 decision tree algorithm improve in parallel with the top of Hadoop platform to give Decision tree classification rules and take into account the classification rules in the form of a text of a stored procedure to write again MapReduce test data and classification rules is verified by comparing the results of the decision tree classification, by comparison, it is found to improve and parallel to Hadoop C4.5 decision tree algorithm over time efficiency in the classification and the classification accuracy has improved significantly.
Keywords/Search Tags:network traffic, C4.5 decision tree, Hadoop, parallel
PDF Full Text Request
Related items