Font Size: a A A

The Design And Implementation Of Log Analysis System For The User-Oriented Personalization Recommendation

Posted on:2014-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ZhangFull Text:PDF
GTID:2308330482483359Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the rapid increase of the amounts of the Internet user and information, how to obtain the information quickly from the vast data is becoming one of the most important problems for users, which is also to be the key point for the web site to attract users. The online video service now is the biggest hot spot of Internet application, with which the amount of the videos and the video web site is also increased rapidly. Under this condition, the query of the videos based on the keywords has been unable to meet demand of the users. Recommendation engine, which is used to push the information actively based on the users’ historical behavior, appear on the Internet.The sharp increase in the amount of users and videos brings new challenges for recommendation system. First of all, the storage of the massive user logs request the storage module of the recommendation system to provide a scalable and reliable service. Secondly, the massive data processing also require a high performance computation. At last, the information which meets the users’needs better can attract more users. That is to say, the result of the recommendation system must to be designed to archive a high accuracy and validity.This paper proposes a solution based on Hadoop and its series subprojects for the challenges faced by the recommendation system to process massive data-log analysis system for the user-oriented personalized recommendation. The system take advantage of the scalability and reliability features of Hadoop Distributed File System (HDFS) to make Hive as the storage platform of the massive data, which ensure the reliability and scalability of the user log information storage. The system also use the high performance features of Hadoop parallel computing model (Map/Reduce) to make Hive which can convert the SQL statements to the Map/Reduce task as the analysis platform for massive log information, and to make Mahout which is a scalable machine learning algorithm library provides a distributed collaborative filtering (CF) algorithm based on Map/Reduce as the efficient recommendation tool, which ensure the data processing performance. In addition, an optimized modification of the Mahout Source code is used to improve the accuracy and validity of the recommendation result.In order to validate the system, we design a detailed testing scheme. First of all, we prove the availability of the system and the reliability and scalability of the storage model. Then, we verify the improvement in the performance of the data processing and the accuracy and validity of the recommendation result. At last, we prove the actual working effect of the system through building a real experimental environment.
Keywords/Search Tags:Internet, Recommendation engine, Log analysis, Hive, Mahout
PDF Full Text Request
Related items