Font Size: a A A

Design And Implementation Of Network Traffic Analysis System Based On Distributed Architecture

Posted on:2020-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q LuoFull Text:PDF
GTID:2428330575971428Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and the advent of the era of big data,data is experiencing explosive growth,and its value is being continuously explored and utilized.Data determines the development direction of the enterprise to some extent,and the network serves as data exchange and The underlying conditions of sharing are carrying increasing demands for data transmission,and their performance determines the efficiency of data sharing and exchange.In the face of huge network data and highspeed network transmission,how to realize real-time access,storage and analysis of network data is a problem that network traffic analysis must face.At present,the performance of a single server is far from meeting the requirements of network data analysis.The distributed network data acquisition and analysis mode is the development direction and necessary means of this work.Therefore,the adoption of a distributed structure is currently a necessary choice.The distributed network traffic analysis system will focus on solving network data acquisition,data storage,data analysis and visualization capabilities under ultra-highspeed conditions,and implement distributed deployment of functional modules in a loosely coupled manner.This paper is based on the Institute of High Energy Physics of the Chinese Academy of Sciences.The daily data transmission of the High Energy Institute exceeds 1 billion.The peak of domestic import traffic is above 152 G per ten minutes,and the peak of foreign export flows is above 126 G per ten minutes,and the flow is still rising steadily..Therefore,there is now a need for a unified flow computing system that can withstand ever-increasing traffic,and a complete,stable statistical traffic system with visualization capabilities.In view of the above problems,this paper designs a network traffic analysis system based on distributed architecture.The system uses the Spark computing framework to perform real-time analysis and processing of large-scale traffic,and divides the data by time.Use Pmacct to receive traffic,transfer traffic through Kafka and guarantee data integrity under special circumstances,and then store non-relational database Mongodb as the primary database.Mongodb performance monitoring with Prometheus and Grafana ensures rapid response to problems.Finally,the design uses the popular display structures Influxdb and Granfana to display the data on demand.Forming a simple data flow,it is easy to process and analyze real-time data,and the system is also highly scalable.This paper proposes corresponding technical innovations for problems encountered in the system.Firstly,for the data transmission speed problem of Kafka and Mongodb,three different transmission methods are proposed,and the best way is selected.Secondly,for Saprk's data skew problem,a data skew indicator is defined,and when the threshold is exceeded,the data skew is reduced by re-shashing the Task and using the broadcast to reduce the running time of the entire task.In the end,the requirements put forward by High Energy were completed.
Keywords/Search Tags:distributed, big data, Spark, real-time analysis, data skew, Traffic Analysis
PDF Full Text Request
Related items