Font Size: a A A

The Design And Implementation Of Massive Advertising Log Analysis System Based On Hadoop

Posted on:2014-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:W X ZhangFull Text:PDF
GTID:2268330422951998Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Baidu FengChao is a newly promoted advertisement auction system, exploitingthe daily billions of web searches, which brings huge income for both businesscustomers and Baidu. Till2010, the income from FengChao occupies more than20%of Baidu’s total income. However, according to the online running and customerfeedbacks, FengChao still faces many problems in advertisement quality measuring,presence and optimization. These problems will cause economic loss for customersand brings bad effects for FengChao. To address these problems, this paper designsand implemented a massive advertising log analysis system based on Hadoop, aimingto mine abnormal data from massive advertisement log, and further provide visualstatistics on the abnormal data from different views to help FengChao find potentialproblems, after a thorough analysis of the reasons for the abnormal data, finallypropose effective solutions.First, this paper determines the requirement of this log analysis system based onFengchao’s business functionalities, then designs the function structure of this loganalysis system, which can be divided into: log parsing module, log analysis andmining module and web presentation module. Log parsing module complete thepreprocessing of the original log data. Log analysis and mining module is the keypart of this system. It builds computation model for different business monitoringand mine abnormal data in different business, then do a multi-view statistics on theabnormal data. The log analysis and mining module mainly consists of three businessthemes: advertisement quality, advertisement census and advertisement optimization.The web presentation module provides statistics result on a web page with dynamictrend graph and tables.In implementation, log parsing and log mining modules fully utilized theadvantages of Hadoop in processing big data. The massive original log data andanalysis result are both stored in HDFS(Hadoop Distributed File System),establishing a different set of MapReduce computing program to realize the dataprocessing based on Hadoop MapReduce algorithm.The web module adopts LAMP(Linux+Apache+MySQL+PHP) and a popular web application open sourceframework CakePHP. Finally, the log analysis system’s function and performance aretested and verifiedFrom commercial effect, the log analysis system can help Fenchao find potentialproblems, effectively reduce the Fengchao’s online error rate, provides effectivebasis for decision making.
Keywords/Search Tags:log analysis, massive data, Hadoop, MapReduce
PDF Full Text Request
Related items