Font Size: a A A

Research And Implementation Of User Search Behavior Analysis System Based On Hadoop

Posted on:2020-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:2428330578950890Subject:Computer technology
Abstract/Summary:PDF Full Text Request
At present,we are in the DT era of technological innovation,the number of network users has soared,and the amount of information generated by users accessing the Internet is huge every day,mainly the information generated by users when searching.According to statistics,the mobile Internet adds 217 new users per minute.Google generates 2.4 million new search requests every minute.How to deal with these massive search log information,how to mine some potentials from the obtained search log information based on user behavior data.Business-worthy information attracts many companies and many businesses.At present,user search behavior analysis faces two challenges.On the one hand,how to quickly and efficiently process and store massive log information,on the other hand,how to provide a suitable platform for merchants to meet the needs of companies and enterprises through the platform,and to grasp the psychological characteristics and interests of users.Develop a more precise marketing strategy.Based on the above analysis and summary,this paper designs a user search behavior analysis platform.This platform uses the distributed system infrastructure Hadoop and the parallel computing model MapReduce,and combines the relevant clustering algorithm to realize the deep mining of the behavior law of massive log data.The main work of this paper is as follows:(1)By comparing the processing of massive log data with traditional methods,this paper adopts Hadoop as the platform and MapReduce as the computing framework,and uses HDFS distributed file system to store massive log data,which solves the problem of massive data storage.(2)According to the business needs of the system,this paper designs the system according to the process of Web text mining,which is mainly divided into: log data preprocessing module,log data storage module,log data analysis module and log data visualization module.The core part of the system is the log data analysis module.(3)Through the statistical analysis of the data,the search keyword ranking,URLand user click relationship,URL ranking and other dimensions are analyzed.The Canopy coarse clustering is first performed according to the search keyword,and then the result of Canopy clustering is used.-means clustering,and using the cosine similarity calculation method to make it possible to group users and construct user images.Finally,ECharts,Java,HTML5,and JavaScript language are used to visualize the above analysis content.(4)Install and deploy the Hadoop development environment,and perform functional testing and performance testing on the content implemented by the system.
Keywords/Search Tags:Hadoop, Text Mining, Log Processing, SearchBehavior Analysis
PDF Full Text Request
Related items