Font Size: a A A

A Fast Multi-patterns Parallel Matching Algorithm For Massive Http Data Processing

Posted on:2019-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:C QinFull Text:PDF
GTID:2348330542498258Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The development of data services in mobile networks leads to the tremendous growth of net users,making user behavior grow rapidly.And it brings a great opportunity for researchers to analyze user behavior through large-scale network traffic,which is not only significant for ISP(Internet Service Providers)to optimize resource allocation,but also can provide users with more customized service.The analysis of user behavior is based on the extraction of user characteristics,and multi-patterns URL(Uniform Resource Locator)matching is the foundation.However,the efficiency of extracting user behavior from massive network traffic data is still a huge challenge problem.This paper focuses on the efficiency of extracting user characteristics and proposes a novel algorithm,Multi-Patterns Parallel Matching on HTTP Traffic(MPPM)that takes advantage of the hash map in data searching,and it can extract user behavior from massive HTTP traffic more effective and faster than conventional methods with the same accuracy.This thesis firstly describes the current status of massive HTTP traffic data and its related problems and challenges,and introduces the distributed processing framework based on Hadoop and Spark.Second,we analyzes the architecture of Spark and factors that may affect performance of URL matching.Next,the design and implementation of MPPM algorithm are introduced in detail.Then,experiments are conducted by using real-world HTTP traffic data collected from the ISP networks on both MPPM algorithm and other known methods.Finally,we use MPPM algorithm to analyze user behavior from massive network traffic and solve the problem of massive URL matching in actual projects.The proposed algorithm and implementation will be a solid base to build a high-performance analysis engine of user behavior for massive HTTP data processing.
Keywords/Search Tags:HTTP traffic, URL matching, multi-patterns matching, user behavior, Spark
PDF Full Text Request
Related items