Font Size: a A A

Massive Network Traffic Data Mining Based On Large Scale Graph Analysis

Posted on:2017-08-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:C FangFull Text:PDF
GTID:1318330518994736Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With 3G/4G wireless communication technology development and popularization, personal mobile terminal processing capacity enhancement, and personalizedweb applications enrichment, mobile Internet has become an important part of people's daily lives and the main channel to access and share information. Consequently, the proportion of mobile Internet traffic in the wireless communication network traffic increased rapidly.The voice business is gradually saturated. Mobile network operators and service providers face to the pressure on profit growth. In order to improve the ARPU of users and achieve sustained growth in revenue, Mobile network operators and service providers need fine-grained network traffic management. However, the amount of users and data are the massive dataset, which easily reaches billions of scale.Furthermore, modern Internet business is richer than the traditional business such as voice and text message. Today's mobile networks are generating massive traffic data at all times. The massive dataset includes mobile webpage data, user interaction data, activity data generated by the device, DNS query dataetc. Comparing to traditional data processing, the massive dataset has significantly different characteristics in three dimensions: volume, variety and velocity. Face to volume, variety and velocity traffic dataset, traditional traffic analysis technology cannot meet the needs of network operators. The network operators need parallel algorithms for the massive dataset.Under this background, this thesis proposed parallel computing approaches to handle massive network traffic dataset. Parallel computing approaches mainly use prevalent Hadoop framework and Spark framework. Hadoop framework opened up a new era of massive data processing. Spark framework is an updated version of Hadoop framework.By using memory calculations, Spark framework makes the processing of massive dataset quicker. This thesis uses different technical framework according to different application scenarios and problem requirements.Meanwhile, due to the explosive growth of Internet applications, network traffic has become very complex. Only through statistical analysis cannot reveal the inherent characteristics of network traffic. In order to analyze the network traffic and reveal the complex relationship between the various functional entities, this thesis models the network as a graph, uses graph analysis methods to solve the problems of network traffic.Furthermore, the analysis results are graphically visualized.The main research contents and innovations are as follows:(1) Based on user web page browsing behavior and page loading process, propose a graph model to describe the entities in web pages.The graph model is analyzed to understand the relationships between the Internet entities.The graph model has the following three features:Firstly,the graph model is used to present entities in the real network,which reveal the structure and relationships between entities.Secondly, based on the graph model, many applications can be performed. For example, user click requests recognition from graph model.We design and implement a parallel algorithm, which can accurately recognize user click requests from massive dataset.Thirdly, the experimental dataset are the massive real mobile network traffic dataset collected by TMS(Traffic Monitoring), which is a self-developed hardware system by our lab. We design a self-learned approach in the choice of experimental parameter.The experiments prove the feasibility and accuracy of model.(2) The whole Internet entities graph is large, sparse and complex. In order to reveal and visualize the structure between entities,we propose a web entities analysis approach based on dependency graph.To build the graph model, we use the massive traffic dataset that captured from the real network environment. The scale of the graph model is enormous, which is not suitable for direct observation and analysis. Towards this end, we propose a graph analysis approach based on dependency graph model. Graph analysis approach can decompose large graph into small graph. These small graphs have densely connected structure and they are easily observed.(3) Graph model is a mathematical abstraction of the physical entity.Graph model analysis needs a large number of mathematical calculations and graphic algorithms. Towards this end, we design analgorithm library to process massive data based on Spark framework. This library can be regarded as a basis for other traffic analysis algorithms.Spark framework has a much richer computing power than Hadoop framework. Design and implementation a number of basic algorithms based on Spark framework, including matrix multiplication, matrix inverse and so on.(4) DNS query data is one of the important data for network traffic analysis. We model the query record and the returned result, and apply the graph attribute information to the malicious domain name recognition.Network operators are very concerned about malicious domain name recognition. However, the malicious domain name is difficult to identify.Malicious domain name recognition needs to integrate a number of characteristics and uses the effective classification approach. Towards this end,we use a number of values of the DNS graph model, such as outdegree, indegree, centrality, etc., as the attribute values to classify the domain name and identify the malicious domain name in DNS query data.(5) For the high-speed streaming data in the mobile network environment, we adopt a fine-grained analysis of network traffic by parallel streaming algorithm.The network operator's traffic analysis tasks can be divided into two categories: 1.Batch analysis of static network traffic. 2.on-line analysis of high-speed streaming data. In recent years, with the development of network technology, operators have adopted a large number of 100Gbps port in the backbone network, which brings new technical challenges to real-time analysis of network traffic data. Towards this end,wedesign a streaming analysis algorithm to analyze high-speed streaming data, and make a fined-grained analysis of mobile web traffic.
Keywords/Search Tags:Parallel Computing, Hadoop, Mobile network, Network Traffic
PDF Full Text Request
Related items