Font Size: a A A

The Research And Implementation On Processing Technology Of Massive Network Traffic Log Based On Hadoop

Posted on:2015-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:W L QuFull Text:PDF
GTID:2298330467963033Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the high-speed development of Internet, the coming of the era of big data, the processing technology of large data based on network also arises. In order to meet the demand of effective and deep analysis of large network traffic, and realize the supervision of network traffic, network traffic log needs to be effectively collected and then efficiently analyzed. Multi-dimensional statistical analysis of the network traffic log can understand the running and using state of the network to adjust the strategy and improve network quality; Deep analysis of the network traffic log can contribute to find users’preferences, understand the needs of user and enhance customers’satisfaction. Therefore, this subject aims at all kinds of processing technologys of massive network traffic log, and finally designs a massive log analysis system based on hadoop(HAMANT).This article first introduced background, significance, research status and key technologies including Hadoop, Hbase, big data, DPI, data mining, etc. Then the analysis of demand and overall framework was given according to the project and application scenario, which included the log collection, pretreatment, storage, statistical analysis and mining analysis modules. Finally, the various performance tests were carried out based on large network traffic of backbone network of some university, which proved that this system for processing large amounts of network traffic log could achieve good effect, and also had certain extensibility.This subject aimed at network traffic log processing technology, and finally designed a log analysis system(HAMANT) based on Hadoop. The log collection module introduced the DPI protocol recognition engine to enrich the network traffic log effectively; Log storage and processing module adopted parallel processing technology, which supported the automatic backup and fault tolerance to overcome the problem of traditional single-machine log processing; Clustering algorithm in data mining was improved, implemented the deep analysis of the massive network traffic log, to explore the hidden preferences behind a large number of online users’behaviors. Finally the performance tests were carried out and practical application results were shown.
Keywords/Search Tags:Hadoop, Hbase, Network traffic classification, Data mining
PDF Full Text Request
Related items