Font Size: a A A

Design And Implementation Of Hadoop-Based Network Traffic Analysis System

Posted on:2016-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:H B LiFull Text:PDF
GTID:2298330467992421Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Birth of the Internet has been half a century, we have experienced a Web1.0and Web2.0era, is going forward the Web3.0era-Network Services Personalization. Internet is constantly changing our way of life, but also continuing to give us opportunities and challenges. In recent years, massive network data are generated and give us a great value, but also raises difficult questions. The growth of network data has far exceeded the capacity of a single server, how to effectively extract useful business value from massive data has become a hot research in the major Internet companies.Through the network data analysis, we can learn the user terminal’s preferences, user behavior characteristics of the network traffic preference, the network browser preferences and so on, then we can optimize the network traffic, improve the user experience. Thereby wo can increase the user’s activity、retention and bring us greater profits. The generating of the massive network of data makes the traditional network traffic analysis in a single server not meet the needs of the effectiveness of the business. With the amount of data grows, the efficiency decreases quickly. In recent years, Hadoop in the off-line analysis of massive data processing has been to the fore and has played an increasingly important role, further it was excellent validation of distributed open platform. We can deal with the increasing amount of data by increasing low-cost nodes with good stability and scalability. So the distributed analysis system of the network traffic built on Hadoop is a reasonable choice, but also meet the needs of the business. This main modules of the article aboat the Hadoop-based network traffic analysis system have five parts, namely the network data collection module, the data storage module, the data preprocessing module, the statistical analysis module and the data show module of the results. The main results of the work of this article are as follows:1) A Hadoop cluster on three nodes has been setted up, the design and implementation of the network traffic analysis system in this cluster has been completed and the browser chart display functions on the statistical results of have been finished.2) The result that the efficient of The Hadoop distributed data processing platform is better than in a single server has been verified.3) Through testing the performance optimize of the network traffic analysis system and analyzing the experimental results, we have obtained the corresponding optimization strategy.
Keywords/Search Tags:the network traffic analysis, Hadoop, HDFS, MapReduce, the Hadoop performance tuning
PDF Full Text Request
Related items