Font Size: a A A

Design And Implementation Of User Behavior Analysis System Based On The Hadoop

Posted on:2015-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y HaoFull Text:PDF
GTID:2268330425988778Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
ABSTRACT:With the rapid development of information technology, the Internet is widely used in traditional industries. In recent years, with the extensive application of the internet, the prosperity of social network and the increasing number of users, the data is also having an explosive growth."Big data" has become a key technology to analyze and dig the potential rules and application value of the network data. The network data cannot be produced without the user’s online behavior. As a result, digging out users’ specific network behavior information accurately and timely from the huge amount of data is essential for strategy control, intelligence network services, and promotion of future development of the controllable network. On the background of big data, comprehensively and accurately analyzing the network users’ behaviors becomes a big problem. So the research focuses on the design and development of user behavior analysis system based on Hadoop. The key technology of this research is network security development kit Libnids and distributed platform Hadoop. This system realizes the data packet capture and distributed storage, TCP recombination and the application layer of the HTTP behavior analysis and other functions. This helps provide not only recommendation service better based on user behavior characteristics, but also reasonable control of effective technical support.In this paper, the author adopts the user behavior analysis method based on Hadoop. Firstly, it uses high speed data package tool PF_RING to collect data from network as the data source, and stores it distributedly. Then it uses Libnids, the net security developing tool, to reorganize the packets and TCP/IP, and restore the HTTP application layer and then call the Hadoop clusters. Moreover, it uses the distributed Mapreduce programs to analyze the user’s network behavior in the application layer and finally achieve full analysis of the whole network layers. It identifies the users’ behaviors by analyzing their search terms, shopping trends, website messages and regular sites. Knowing users’behaviors and demands timely will help to control user network behavior, improve network services, and gradually realize the intellectualization of the network.In this paper, based on the mature user behavior analysis technology and large data processing platform of the existing network, a Hadoop-based system for user behavior analysis is designed. The main research contents are listed as follows.(1) Technology of high-speed link packet capture under the environment of big data, which is based on the PF_RING.(2) Technology of massive data storage, which is used for store output files from packet capture system.(3) Technology of the http protocol reduction based on distributed Mapreduce programs.
Keywords/Search Tags:The user behavior analysis, Mapreduce, HTTP protocol reduction, TCPrestructuring
PDF Full Text Request
Related items