Font Size: a A A

Hadoop-based IP User Access Behavior Motivation Analysis

Posted on:2018-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ZhangFull Text:PDF
GTID:2358330536458554Subject:Network information retrieval and content understanding
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,people's lifestyles gradually turn into a virtual network on the Internet from the real social network.Because the Internet is a borderless virtual communication network,it can be more vividly highlight the personal characteristics of network IP users based on the freedom of information flow,freedom of speech and freedom of use.At the same time because the IP address has a unique feature,so it is more like people' identity card in the Internet "society".Therefore,mining IP users online behavior from the network log,in order to find the user's search intentions,interest preferences and online motivation has a very important significance.This paper focuses on the hidden information contained in the network log.Through the deep mining analysis of the hidden information,the relationship between the Internet user behavior and the user's psychology is found out,which is based on the traditional user behavior analysis method based on the network log.So as to provide new ideas and directions for IP users' online behavior analysis and research.In order to improve the online log information of online behavior,this paper constructs an external auxiliary resource library,and analyzes the Internet behavior and Internet behavior motive of IP users based on the resource library.Specific research work includes the following aspects:(1)Network log data collection and processing and IP-assisted knowledge baseThis paper uses the Huawei Center Log Log Management tool to collect and analyze the session log of a NAT device.The raw data is preprocessed for the problem of incomplete IP address,conflict and noise data in the log data,and realize the log data cleaning,sorting and storage.At the same time,a set of global IP domain knowledge base is constructed,including 9 million IP addresses,150,000 IP network segments abroad.The construction of the feature library realizes the extraction of the characteristics of the global IP domain class,improves the query effect on the IP geographical location,and provides a feasible solution for the accurate extraction of the IP domain class feature,as well as the follow-up IP user.The extraction of Internet behavior features provides data support.(2)Online user behavior analysis and abnormal traffic detection methodA method of detecting network anomaly based on sliding time window is proposed.By analyzing the network log,the online behavior of the network users is analyzed from the three aspects of IP user's geographical distribution,active time distribution and access content distribution,and the network traffic under certain time period is analyzed by sliding time window technology Detection,to achieve the supervision and attention to abnormal IP.The results show that the proposed method based on the sliding time window technique is effective and feasible.(3)IP user access to the content of the discovery and user clustering method researchThis paper proposes a method of topic discovery based on LDA(Latent Dirichlet Allocation)model.This method not only solves the shortcomings of the original LDA model,but also solves the problem of incomplete vocabulary of subject knowledge base.At the same time,through the analysis of the URL accessed by IP users,the content information such as keywords,title and page description of the access page is extracted,and the clustering of IP users with similar access contents is realized by K-means clustering algorithm.(4)Research on Internet users' behavior motivation analysisThe content visited by the web user can reflect the motivation of the Internet,so we can convert the discovery of the Internet motivation into the classification of the content it visits.This paper proposes a LLA(Libsvm and Liblinear Algorithm)based on LIBSVM model and LIBLINEAR model.This model is based on the high accuracy of LIBSVM classification and the characteristics of LIBLINEAR for large data.Through the experimental analysis,the two models are weighted to obtain a better classification model for large data processing.The weighted weights of the two models are obtained by experiment.The experimental results show that the LLA model achieves high accuracy in the classification process of IP users' behavior motivation.
Keywords/Search Tags:Hadoop, IP user, Internet behavior, Behavioral motivation, LLA model
PDF Full Text Request
Related items