Font Size: a A A

The Correlation And User Behavior Prediction Analysis Based On Large Scale Network Access Data

Posted on:2018-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:S T LiFull Text:PDF
GTID:2348330518495578Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the constant popularization of the Internet and the further development of Electronic Commerce, people's lifestyle has been deeply affected. Various services on the Internet, such as web click and browsing,search engines, online shopping, social networking sites, etc. While providing convenience for Internet users, more and more user behavior data are also recorded. And there are some kind of implicit special relationships between these large-scale network access data, how to extract valuable information from these data has become a hot research.Based on the large-scale operator user's DPI data and combined with the classification of the tag data by crawler, the distributed idea is adopted to realize the statistical analysis and correlation analysis of the data. And based on the user's historical behavior data, use the Markov prediction model to predict the future behavior of the user.In this thesis, we mainly deal with the large-scale network access data, and crawl site URL and the corresponding classification tag data based on the crawler program implemented by Python. And then employ HDFS provided by Hadoop to realize the distributed storage of data.Complete the process of network data reliably and efficiently based on MapReduce framework and the statistical analysis from the page access,the number of independent users, the average length of access, centralized access time point distribution in the automotive industry. Use the URL classification data of the vehicle obtained through a crawler to identify the auto user from DPI data and then extract the static behavior and dynamic behavior characteristics of industry users. Start the work of association rule mining and behavior prediction based on user's dynamic behavior sequence, browsing behavior and searching behavior included.FP-Growth algorithm is implemented in a distributed idea, and it is applied to large scale DPI data next. Then, association rules mining is performed to find out the correlation between the dynamic behavior of the user. Finally, based on the accumulation of historical dynamic behavior of users in the short term, the prediction model of Markov is constructed to predict the behavior of users. Before constructing the prediction model, we have to divide the forecast period into a more accurate classification. Predict the user's behavior in next moment according to the behavior of users in the first few moments. Then compare the predicted results with the actual behavior of the users, and calculate the accuracy of the prediction. The results show that the accuracy of prediction improves obviously with the increase of the sequence length of user access behavior. Finally, sort all the output results and visualize them in the form of charts to facilitate the analysis.
Keywords/Search Tags:Distributed technology, statistical analysis, association analysis, Markov prediction model
PDF Full Text Request
Related items