Font Size: a A A

Website Concurrent Performance Analysis Based On Massive Log Files In Hadoop

Posted on:2015-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhaoFull Text:PDF
GTID:2298330467464803Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The three categories of massive data, business data, scientific data and web data, have threefeatures, heterogeneity, dynamic and expansive. These bring enormous pressures to traditional dataprocessing. Developers try to break the conventional stand-alone data processing mode, and try touse large-scale distributed computer clusters to improve data processing performance. The massdata processing technology represented by Hadoop stands out in a number of competitors. It greatlyreduces the pressure of data processing. There for all to see, because of the virtues of scalability,robustness and irreplaceable advantages in computational performance and cost, it graduallydevelops to the best solution for massive data processing. IT companies are paying more and moreattentions to Hadoop.Firstly, we study the application status of massive data related technologies and thearchitecture of Hadoop platform. Then select the massive log files as the object to handle, propose amassive Hadoop-based log file processing model. According to the characteristics of the log file,analyze the format of log file at first, and then follow MapReduce programming framework and usethe two-stage structure to design the summary module and the sorting module separately, in order tocount the log file concurrency and the average response time efficiently in the distributedenvironment. According to the experimental data, explore the changes of the high site response timein concurrent case, and then efficiently analyze the merits of site performance. This method cansimplify the design of distributed program and help to solve the underlying package assignments,parallel processing, fault tolerance support and other details at only one time. It can also improveoperational efficiently. Then through building the real Hadoop platform to run the program,comparing between this method and traditional processing method, analyzing the strengths andweaknesses. Verify the model can achieve the design objectives, which is efficiently handlingmassive log files in a distributed environment, and along with the robustness and scalability.
Keywords/Search Tags:Hadoop, Massive Date, Distributed Systems, Log Processing, ClusterComputing
PDF Full Text Request
Related items