Font Size: a A A

The System Design And Implementation Of User Behavior Analysis Based On Hadoop Platform And Query Log

Posted on:2017-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:C J YanFull Text:PDF
GTID:2308330485483398Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, when user search information by search engines, the query log data always generates more and more than before. Internet companies are in urgent need of a program that can effectively extract the user’s behavior information from these log data, and can mine user’s search needs, analysis the users’behavior laws based on the access to user data, therefore, improve the quality of service for users’searching for information. Therefore, it is needed to solve two problems for doing a good job about user behavior analysis. On the one hand, it is about how to storage a mass of log data and data preprocessing; On the other hand, it is about how to design the user behavior analysis model to capture user behavior features, and how to select the appropriate research platform to analysis the user’s behavior based on the available access to the user data.Based on the above analysis, in this paper, we put forward a scheme of applying data mining platform Hadoop to user behavior analysis system, we use MapReduce as the computing framework, use HBase as the data storage medium, use the Sogou user query log data as the analysis data, and combine correlation algorithm to analyze the law of user’s search behavior. The major work of this paper is as follows.1. Compared the features of traditional relational database and NoSQL database HBase, which was based on column, finally, we decide to use the HBase store log data, and use the user’s cookie and search date together as the row key of HBase database, achieve the expect result that the same user’s records can be concentratively stored, database query would become more convenient.2. According to the functional requirements, this paper proposed detail implementation method. including using the Chinese word segmentation technology to process the search keywords, and extracting the text feature vector sets model from the word segmentation. Through the improvement of value sorting algorithm of MapReduce to complete the following items, which including counts user’s search keywords, and we analyses the relationship between URL rank and user’s click, counts user’s visiting to web page time. We restructure the K-means clustering algorithm with distributed technology based on the MapReduce programming model, in order to cluster the user’s query theme, in the end, this paper models the user’s behavior characteristics and construct the userprofile.3. Through deploying Hadoop cluster development environment, complete the test of user query log data. Through detail analysis of the obtained result, we obtain the corresponding conclusions.Finally, the correctness of the improvement about relevant algorithm is proved through using the MapReduce computing model in system test, at the same time, obtaining the good result of the cloud storage technology in the user behavior analysis system.
Keywords/Search Tags:Query Log, User Behavior Characteristics, Hadoop, Distributed Processing
PDF Full Text Request
Related items