Font Size: a A A

Design And Implementation Of A Platform For Massive Log Data Analysis Based On Distributed Computation

Posted on:2016-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:2308330482460413Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Since the new century, Internet business of technology-driven has achieved rapid development in PC or MOBILE. Business development will inevitably lead to increased data volume. Among these data, the log data is clearly a top priority. A huge valuable information was embedded in these complex, confusing log data, which has become the consensus of business and many companies also have a special log analysis system. However, with the rapid development of the Internet business, the number of log was rapid growth. For example, the amount of Baidu’s daily log about 200GB, there is likely to reach TB level during the holidays, the traditional centralized log analysis approach in dealing with such a large data set has been weak up. With the popularity of big data and distributed computing technology matures, the use of distributed technology enterprise log analysis has become a research hotspot.Apache Foundation’s Hadoop technology is currently very popular distributed computing technology, Hadoop has two functional blocks, which is HDFS and Map/Reduce. HDFS, Hadoop Distributed File System, a distributed file system that provides a suitable mass data storage efficient, reliable distributed file system. Map/Reduce provides a distributed computing model, users only need to focus on logic to the code, the other by the model yourself. As can be seen from the above two points, Hadoop technology fully capable of massive log data analysis. In fact, many companies have begun to study the use of Hadoop method of analyzing the logs.This paper summarizes and analyzes the problems of current log analysis method, proposed a distributed computing technology-based log processing program, the design and implementation of distributed computing massive log data analysis system based on. Systems interact through a browser, a user in the client configuration tasks statistics, processed by the system to submit to a background of Hadoop clusters is calculated and the result is stored in the database, data is read from the client to show. Specifically, the system includes the following functions:data preparation, data storage, data computing, data presentation, limits of authority management, alarm monitoring.
Keywords/Search Tags:Massive Log, Hadoop, Distributed Computing, Statistics
PDF Full Text Request
Related items