Font Size: a A A

Web Log Analysis System Based On Hadoop Platform

Posted on:2014-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:R R LiFull Text:PDF
GTID:2308330464957931Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development and the widespread popularity of Internet, the amount of web information grow at an alarming rate. Currently, World Wide Web has penetrated into every corner of human society and becomes a distributed information space which has a hundred million workstations, billions of pages, and contains a huge knowledge. E-commerce sites have created an unprecedented number of visits, a variety of large-scale online games constantly refresh the peak number of online users,at the same time these large systems recorded massive user log. On e-commerce sites, website access logs provide decision support for the site managers and then guide Web site operators, such as improving the structure of the site to enhance the user experience and to improve your site’s core competitiveness.Hadoop is an open source framework under Apache distributed computing platform that provides a simple programming model for distributed processing of large amounts of data. Hadoop generally runs on a computer cluster composed by a large number of ordinary composition. E-commerce website access logs pretreatmented and analysed, you can use cluster to conduct parallel processing and analysis logs, rapid and timely decision-making.A large site analysis engine project is to provide enterprises based on traffic, source path, visitors, content, goods and orders six object’s data analysis through graphical reporting form to the e-commerce business managers demonstrate the core data, such as moving off, conversion rate, recovery rate of purchase and sale of concentration, etc., while meeting enterprise for Web Analysis and business Analysis analytical needs.Finally, we design and implementat a distributed computing platform based on log analysis system and use the system made website traffic, website sources of site visitors and orders, etc. analysis. This function modules of the system were elaborated experiments were carried out for comparative analysis. Experiments show that the analysis system work higher than the single centralized environment and also can be obtained the execution time of the task not only with the number of nodes, as well as deal with the complexity of the tasks related logic.
Keywords/Search Tags:E-commerce, Map/Reduce, Hadoop, Log Processing
PDF Full Text Request
Related items