Font Size: a A A

The Design And Implementation Of "Product Ads Realtime View Analyst" Based On Big Data

Posted on:2016-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:J DaiFull Text:PDF
GTID:2428330461460105Subject:Engineering
Abstract/Summary:PDF Full Text Request
Data is basis for a company or team to design products,make and adjust strategies and take introspection.With the rapid development of big data,a big amount of heterogeneous and complex information has brought big challenges to data analysis Product Ads team which belongs to "FengChao" architecture in Baidu aims to provide clients with solutions for advertising in a batch way with high quality and efficiency,and the user behavior logs plays an important role in making such solutions.The efficient and comprehensive analysis of these logs creates a view between the team and the data,so that data can timely tell us its value if we keep an eye on it,thereby we come up with a better solutionUser behavior log is somewhat big data which is heterogeneous and asynchronous This means it is hard to process data with traditional sequential methods.Even we read the file splits concurrently,we cannot help relying too much of program status both inside and outside,which resulting in a lot of side effects.An acceptable practice is using MapReduce programming framework which introduces ideas from functional programming to analyse data daily,usually data yesterday.However,the non-real-time analysis of large data can make a big delay which lead to a slow feedback in product environment,a non-timely trace for new product testing or other issues,as a result,causing huge losses.As a matter of fact,the results of processing logs will be filtered as structured data stored in relational databases.Good data services such as Baidu ReportEngine system using multi-level caching and distributed concurrent transactions to provide fast data queries with a variety of topics.Nevertheless,this means that the change of themes will bring very expensive costs,the more complex system is,the more difficultly topics upgradesTo solve the problems mentioned above,the author designed and implemented a system named "Product Ads Realtime View Analyst"(hereinafter referred to as PARVA).It will take a more fine-grained real-time data check,data process,load DB tasks scheduling in order to make the time interval of processes consistent with the the time interval of producing logs by business upstearms.PARVA system uses configuration programming,and using Hadoop to make a basic analysis to the logs once the logs are ready.After these processes,local unstructured files are generated.Then,PARVA extract data from local unstructured files and put them into a relational database MySQL which stores structured data,so that we have data from minutes to days.PARVA system makes a use of both local unstructured files and db structured data with different granularity to make a higher-level data monitoring and analysis.Also,PARVA system organizes these data into report mails,PHP web site and Chrome extensions to display data with rich forms to users.In this paper,we will discuss why is the PARVA system born and how we design the whole system,the implementation of important modules will be described in addition.
Keywords/Search Tags:Big data, Log analysis, Hadoop, Realtime, Chrome extension
PDF Full Text Request
Related items