Website Concurrent Performance Analysis Based On Massive Log Files In Hadoop

Posted on:2015-02-22

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Zhao

Full Text:PDF

GTID:2298330467464803

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The three categories of massive data, business data, scientific data and web data, have threefeatures, heterogeneity, dynamic and expansive. These bring enormous pressures to traditional dataprocessing. Developers try to break the conventional stand-alone data processing mode, and try touse large-scale distributed computer clusters to improve data processing performance. The massdata processing technology represented by Hadoop stands out in a number of competitors. It greatlyreduces the pressure of data processing. There for all to see, because of the virtues of scalability,robustness and irreplaceable advantages in computational performance and cost, it graduallydevelops to the best solution for massive data processing. IT companies are paying more and moreattentions to Hadoop.Firstly, we study the application status of massive data related technologies and thearchitecture of Hadoop platform. Then select the massive log files as the object to handle, propose amassive Hadoop-based log file processing model. According to the characteristics of the log file,analyze the format of log file at first, and then follow MapReduce programming framework and usethe two-stage structure to design the summary module and the sorting module separately, in order tocount the log file concurrency and the average response time efficiently in the distributedenvironment. According to the experimental data, explore the changes of the high site response timein concurrent case, and then efficiently analyze the merits of site performance. This method cansimplify the design of distributed program and help to solve the underlying package assignments,parallel processing, fault tolerance support and other details at only one time. It can also improveoperational efficiently. Then through building the real Hadoop platform to run the program,comparing between this method and traditional processing method, analyzing the strengths andweaknesses. Verify the model can achieve the design objectives, which is efficiently handlingmassive log files in a distributed environment, and along with the robustness and scalability.

Keywords/Search Tags:

Hadoop, Massive Date, Distributed Systems, Log Processing, ClusterComputing

PDF Full Text Request

Related items

1	Massive Data Processing Application Based On Hadoop
2	Research On Distributed Processing Of Massive Video Data Based On Hadoop
3	Research And Application Of Massive Data Processing Model Based On Hadoop
4	Research Of Massive Data Processing In The Vessel Monitoring System
5	Hadoop-based Network Verification Platform Research
6	Design And Implementation Of A Platform For Massive Log Data Analysis Based On Distributed Computation
7	The Management Of Massive Images Data Based On Hadoop
8	Research And Application Of Massive Small Files Processing Techniques Based On Hadoop
9	Research On Direction-of-Arrival Estimationof Distributed Sources In Massive MIMO Systems
10	Research On Hadoop Based Telecom Operators Massive Data Processing Techonology And Its Applications