Font Size: a A A

Research And Implementation Of Personas Based On Network Traffic Analysis

Posted on:2018-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:J C WangFull Text:PDF
GTID:2428330596990043Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet and the arrival of big data,big data mining and analysis can help to aware business related sensitive data more accurately and rapidly.Big data based personas can help to get better characteristics abstraction to provide support for the business.The quality of the data source determines the quality of the results of the data analysis.Network traffic data can provide better integrity and higher credibility than other data sources.Based on the network data stored in the Pcap file as the data source,we study and implement the key technology in the process of dealing with the original network traffic data to the final personas.The main work of this paper is as follows:Firstly,we study and analyze the format of the Pcap file.Make comparison and research between existing Libnids based and Hadoop based solutions.Then we evaluate the solutions from aspects of cross file TCP processing,expansibility,etc.Based on the analysis of the advantages and disadvantages of the existing solutions,we design and implement the TCP reassembly system on the Hadoop framework.On the basis of TCP reassembly,in this paper we study the HTTP and analyze the critical fields of HTTP in application scenario.Then we parse HTTP data,design and implement HTTP data storage component based on HBase.Aming at the problem of text extraction in complicated web pages,we study related solutions to achieve the extraction of significative text.Based on the research of the text classification process,we achieve preprocessing steps of Chinese corpus,including Chinese word segmentation and text presentation.We implement the classification of HTTP text data using the Spark framework based on improved feature extraction algorithm.Finally,based on the classification results of HTTP text data and HTTP access records,in this paper we study the process and technology of personas,design and implement tag labeling algorithm and personas function using Big Data Framework with practical data.
Keywords/Search Tags:big data, TCP reassembly, network analysis, text classification, personas
PDF Full Text Request
Related items