Font Size: a A A

A Spark-Based Log Collection And Data Service Integration Framework

Posted on:2018-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2428330512483561Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Log provides a significant and efficient method for large-scale application software troubleshooting.However,as a long-running software,it may faces several challenges in analyzing log:First,due to the huge amount and the scattered distribution of log data and other issues,it is difficult to collect log data for users;Second,the diversification of log data format will cause the users to read the log data difficult,and the large number of valuable information in the log data is ignored,which cannot be fully utilized by the user;Finally,the storage of log data is not standardized will consume a lot of storage resources.For the above problems,in this paper,we propose a Spark-based log collection and data service integration framework LCSF,which is divided into log data collection,preprocessing,storage,and query module:(1)In the log data collection module,an extensible distributed log collection strategy is proposed,which collects the real-time data stream and integrates the original log data that is scattered while massive.The strategy is scalable to meet the ever-increasing demand for large amount of log data and log data sources.(2)In the log data preprocessing module,based on the Spark platform,the log data is batch-processed,then filter,deduplicate and fragment the log data to achieve real-time processing for the log data required by the user from the massive and diverse log data.(3)In the log data storage module,a multi-level data storage model based on access frequency is proposed.The model combines the relational database with the non-relational database to store the log data according to different access frequency.Different types of log data are stored in different server clusters,which solves the problem of diversification and heterogeneity for huge log data,can effectively manage various types of log data,avoid wasting resources caused by free data storage,and greatly improve the log data storage performance.(4)In the log data query module,based on high-speed retrieval from Solr,the information contained in the log data is quickly displayed to the users through the service interface,which improves the efficiency of the subsequent analysis work using the log data and satisfies the diversification demand for users.Based on the proposed log collection and data service integration framework,a Spark-based collection and data service integration system is designed and implemented.The system utilizes the extensible distributed collection strategy and multi-level log data storage based on access frequency to achieve the safe and reliable collection and access of log data,through the Spark Streaming to achieve real-time and efficient log processing,and finally uses Solr to achieve dynamic configurable log data query service.
Keywords/Search Tags:log, collection, preprocessing, storage, data service integration
PDF Full Text Request
Related items