Font Size: a A A

Network Traffic Analysis And Optimization Based On Script Language

Posted on:2017-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:X P LiFull Text:PDF
GTID:2348330518495915Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Over the years,with the rapid growth of the Internet,the monitoring and analysis of network traffic has become an important thing.Thus storage,computation and analysis of mass data has gradually become a major problem faced by the operators.The analysis of network traffic gradually becomes from the single to the Hadoop distributed system.In order to facilitate the use of data analysis,software engineer developes Hive and Pig based on the traditional MapReduce.But with the demand of the fast and real-time data analysis,Impala and Spark SQL have been bom,which are different from the traditional MapReduce framework.In the face of different business,we need to use these new tools for network traffic analysis,the relevant performance optimization of these different types of large data scripting language is still very few,we can not give full play to the advantages of distributed systems analysis in the face of network traffic.So this thesis will optimize and compare the three different types of scripting languages such as Hive,Pig,SparkSQL and Impala to meet the growing demand.This thesis first introduces the research background and the research status of the related fields.Then the thesis introduces the present situation of network traffic analysis,typical framework of network traffic analysis in Distributed Systems and the reasons for using the distributed system to analyze the data.Subsequently,from the source code of the view,we introduce the MapReduce framework and Pig,Hive architecture.And then we optimize the Pig and Hive from several aspects based on the network traffic data,for example,the combination of small files,the result of the intermediate output compression and optimization strategy of join.Then,the architecture of Spark and Spark SQL is analyzed,advantages of Spark computing model relative to MapReduce is compared,and SparkSQL is optimized from the view of memory management,for example,the use of caching,StorageLevel and data serialization.Then,from the point of view of file storage and file format,the thesis compares the advantages and disadvantages of several common file formats(SequenceFile,RCFile,Parquet)and compression methods(Gzip,Bzip,Snappy,Lzo),as well as the adaptation of the scene.Then from the perspective of the compression method,the thesis analyzes the similarities and differences of several compression methods.Finally,this thesis builds the network traffic distribution system based on CDH5,7 kinds of common network traffic analysis requirements are selected,and a mathematical model is constructed.We analyzes the three kinds of common tools in this paper,from the analysis tools,file format and compression method.
Keywords/Search Tags:network traffic analysis, performance optimization, Hadoop, Spark SQL, Impala, file format, compression
PDF Full Text Request
Related items