Font Size: a A A

Research And Application Of Hadoop-Based Network Traffic Analysis System

Posted on:2015-04-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Y QiaoFull Text:PDF
GTID:1228330467463642Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Mobile Internet era has brought a great convenience to information exchange and the profound changes to social behavior of people. The analysis of mobile traffic data gives us the opportunity to deeply understand mobile user, mobile traffic and Mobile Internet. With the development of network and mobile equipment, the traffic data generated by users is increasing rapidly. Traditional traffic analysis technology has failed to meet the requirements as the time of big data comes. It is a great challenge to collect, store and analyze the massive traffic data. Cloud computing technology is very useful to store and process the massive data reliably and efficiently and Hadoop is the most widely used cloud computing framework currently.Under this background, this thesis uses the massive traffic data of real Mobile Internet to design a Hadoop-based flow log analysis system FLAS-to store, process and analyze the massive Mobile Internet flow logs efficiently. For a stable and effective cloud computing environment, we also design a Hadoop cluster monitoring system-ZooManager-to offer complete monitoring and alerting. In order to improve the performance of Hadoop-based cloud computing platform, a model is proposed to predict the CPU utilization and job running time of MapReduce on cloud platform based on resource consumption patterns. Based on the FLAS and ZooManager, we study the characteristics of mobile traffic and mobile user. In addition, the characteristics of complex networks in Mobile Internet are studied as well. The main research contents and innovations are as follows:(1) Design a Hadoop-based offline traffic analysis system, to store and analyze the massive traffic data reliably and efficiently. We design a Hadoop-based tool, Flow Logs Analysis System (FLAS), for analyzing the massive traffic data. FLAS can storage, process and compute the massive Mobile Internet flow records efficiently.FLAS has the following three features:Firstly, FLAS is very suitable for structured/semi-structured data(like flow records) analysis and processing.Secondly, on the data uploading module, the self-developed Traffic Monitoring System (TMS) is used to collect the mirrored packets and generate the flow records. UpLoader will upload all the flow records to HDFS.Thirdly, on data analysis module, in order to improve the developing efficiency and simplify the code developing, we develop a high-level language for expressing the distributed data analysis programs.At last, we examine the running efficiency and fault tolerance of FLAS to prove usability of our system.(2) Design a cloud computing monitoring system for management, monitoring, alerting and optimization of FLAS, to make the whole system stable and efficient.The operation and maintaining of Hadoop cluster have always been a great challenge to users. We design a Hadoop cluster monitoring system-ZooManager-to provide the management, monitoring, alerting and optimization function to cloud computing environment. ZooManager can collect the monitoring data and transform the basic metrics into easy understanding terms based on the specific algorithm. All metrics are stored and analyzed to spot problems and abnormal data. In addition, the monitoring system help us to understand the running status and historical statistics of whole cloud computing platform as the resource point of view, and provides us the reasonable suggestions for optimizing the system.(3) According to the cloud computing resource consumption pattern, we propose a prediction model to predict the running time and CPU utilization for MapReduce job in cloud computing environment.In order to further optimize the FLAS and Hadoop cluster, a model is proposed to predict the CPU utilization and job running time of MapReduce on cloud platform based on resource consumption patterns. The model is based on polynomial regression modeling method to predict the performance of MapReduce job under different configurations of Hadoop. The CPU-intensive benchmarks of Hadoop are tested to verify the validity of the model with different MapReduce configuration parameters. At last, four evaluation methods (SSE, MAPE, RMSE and R2) are used to calculate the accuracy of the model.(4) Using the real mobile traffic data, deeply analyze the characteristics of traffic and understand user behavior from multiple dimensions in Mobile Internet.At present, there are very few researches using the real Mobile Internet traffic data to analyze the traffic characteristics. We collect the real traffic data from a typical city in China for a week. The size of the collected traffic data is more than10TB, which ensure the research results suitable for real network optimization and construction.We analyze the traffic characteristics from three dimensions:flow metrics, time and user preferences. Firstly, we use Poisson Regression model to fit the curve of observed user arrival number. Then we study mobile user behavior from three aspects-data usage, user mobility, web service usage. As for data usage behavior, we focus on heavy user behavior. For user mobility feature, we focus on different groups of user with different span range. As for web service usage, we divided the web services into11categories to analysis the user behavior for each category. In addition, we analyze the relationship between data usage, user mobility and web service usage. In particularly, we focus on the data resource consumer and radio resource consumer, and find some interesting results. Finally, we deeply study the web service usage behavior. We present the "Interest Cluster" and "Normalized Entropy" to further study the browsing interest of users.(5) We construct a user-server network graphs for Mobile Internet as the complex network point of view, to study the characteristics of complex networks in Mobile Internet.Constructing the physical structure of mobile Internet is the key to modeling the network. The existing studies of fixed Internet are not suitable for mobile Internet and a deep understanding of traffic graph in mobile Internet is required. We construct a user-server network graphs from the complete flow records of mobile Internet collected from our self-developed equipment deployed in southern China. We investigate the characteristics of the properties of Web, IM and overall traffic as the complex network point of view. Different types of traffic have different characteristics. In addition, node degree and node strength distribution of servers show that mobile network follows the power law, however, node degree and node strength distribution of users do not possess the power law property. We further study the distributions of average strength for each edge of node that follow the power law, and find the regular pattern of exponents for different applications. In addition, BA model is used to further study the behavior of different types of websites in Mobile Internet.
Keywords/Search Tags:Cloud Computing, Hadoop, Mobile Internet, NetworkTraffic Characteristics
PDF Full Text Request
Related items