Font Size: a A A

The Design And Implementation Of Log Analysis System Based On Spark

Posted on:2015-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:J H ( L i u LiuFull Text:PDF
GTID:2308330461455044Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Currently, the application of the Internet has penetrated into enterprise office systems, and the achievements of enterprise businesses often require the Internet.Through the network, information transmission reduces the cost and improves office efficiency. Howerver, with the convenience network has contributed, the enterprise employees visit work-irrelated websites during working hours. It brings negative effect to both business and network environment. As consequences, there is a need of an audit system providing the enterprises for users’ network access behavior. The results which system recorded will be stored regulately and accurately in the form of texts.With the growth and expansion of the Internet enterprises and application scale, general use of machines with a single log analysis system has no longer meet the current demands. As a result, massive data processing cluster becomes the ideal platform for log analysis. The original data processing framework was proposed by Google in 2003-2006, afterwards, a similar framework, hadoop, was born as a distributed computing framework. At that time, the massive data processing’s performance excelled in the Internet industry. Nevertheless, by using only the Hadoop framework, it is not enough to support real-time analysis and Iterative computing scene. Therefore, after 2009, many enterprises have proposed improved calculation framework in succession, such as Dremel, Spark etc..Based on the above situation, the massive literature reading and reference, as well as the common demands of the user behavior observation for the enterprise, this paper designs a massive log data analysis platform based on Spark. Besides the design of the four module:the log collection, logic processing, webpage display, and task management, the access.log of the Squid server is used in this platform. The four module implements the collect and import of the data, the process of data analysis and processing, the display of a client providing user operation and result processing, monitoring and management of cluster. Compared with Hadoop, Spark brings substantial improvement of performance through memory computing.
Keywords/Search Tags:Spark, Shark, Resilient Distributed Datasets, log analysis
PDF Full Text Request
Related items