Font Size: a A A

The Design And Implementation Of The User Behavior Analysis System Based On Query Logs Of Big Data

Posted on:2016-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:X N BuFull Text:PDF
GTID:2308330470455865Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularization and development of the Internet, people communicate with each other through the network for the exchange of information were frequent. However, how to have an effective information retrieval will become one of the problems faced by the Internet users. Search engine organization make the disorderly information together by establish an orderly index documents provides people a great convenience to have an effective information retrieval.The process of user interact with the search engine will produce a large number of query logs. These user query logs contains many user-related information which can be captured directly to the dominant needs of users and discover theirs implicit demand, so the research on user log attract more people’s attention. The major Internet companies, especially the Internet companies committed to search pay a lot attention to the User query logs, they expect timely and accurate log analysis and mining to discover users’ behavioral characteristics in order to improve the satisfaction of users thereby enhancing their market competitiveness. On the other hand, with the exponential growth in the number of logs, how to effectively handle a large number of logs quickly become a challenge, this is the challenge for traditional storage models and computing performance of the database server. Hadoop is capable of large amounts of data software framework for distributed processing. The research of massive query logs become more convenient by using the distributed storage and computing technical log.Based on the above situation, by reading and referring to relevant literature as well as analyzing on the production of search logs, we design an analysis platform for processing massive search engine logs in this paper. This platform is based on Hadoop which using the HDFS distributed file system to store massive log and using MapReduce computing model. The platform includes four modules, namely log collection module, log storage module, log analysis module and data visualization module and the log analysis module is the entire focus of the system, mainly analysis search log from six dimensions including the keyword ranking, URL rankings, ranking the host, the user search statistics, time statistics, daily search statistics, and mining the user query log by the idea of Web Text Mining Process. Finally, the paper test the platform by setting up an experimental environment and analysis the operational efficiency of the distributed platform. Meanwhile, this paper also optimizes the performance of the platform and compares the system runtime before and after optimization, In conclusion, the experimental data shows that user behavior analysis system based on query log has good stability and effectiveness.
Keywords/Search Tags:Big data, Hadoop, Query log analysis, User behavior
PDF Full Text Request
Related items