A Spark-Based Log Collection And Data Service Integration Framework

Posted on:2018-12-29

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhang

Full Text:PDF

GTID:2428330512483561

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Log provides a significant and efficient method for large-scale application software troubleshooting.However,as a long-running software,it may faces several challenges in analyzing log:First,due to the huge amount and the scattered distribution of log data and other issues,it is difficult to collect log data for users;Second,the diversification of log data format will cause the users to read the log data difficult,and the large number of valuable information in the log data is ignored,which cannot be fully utilized by the user;Finally,the storage of log data is not standardized will consume a lot of storage resources.For the above problems,in this paper,we propose a Spark-based log collection and data service integration framework LCSF,which is divided into log data collection,preprocessing,storage,and query module:(1)In the log data collection module,an extensible distributed log collection strategy is proposed,which collects the real-time data stream and integrates the original log data that is scattered while massive.The strategy is scalable to meet the ever-increasing demand for large amount of log data and log data sources.(2)In the log data preprocessing module,based on the Spark platform,the log data is batch-processed,then filter,deduplicate and fragment the log data to achieve real-time processing for the log data required by the user from the massive and diverse log data.(3)In the log data storage module,a multi-level data storage model based on access frequency is proposed.The model combines the relational database with the non-relational database to store the log data according to different access frequency.Different types of log data are stored in different server clusters,which solves the problem of diversification and heterogeneity for huge log data,can effectively manage various types of log data,avoid wasting resources caused by free data storage,and greatly improve the log data storage performance.(4)In the log data query module,based on high-speed retrieval from Solr,the information contained in the log data is quickly displayed to the users through the service interface,which improves the efficiency of the subsequent analysis work using the log data and satisfies the diversification demand for users.Based on the proposed log collection and data service integration framework,a Spark-based collection and data service integration system is designed and implemented.The system utilizes the extensible distributed collection strategy and multi-level log data storage based on access frequency to achieve the safe and reliable collection and access of log data,through the Spark Streaming to achieve real-time and efficient log processing,and finally uses Solr to achieve dynamic configurable log data query service.

Keywords/Search Tags:

log, collection, preprocessing, storage, data service integration

PDF Full Text Request

Related items

1	Research On The Key Technology Of Fault Tolerance Based On Fault Data Preprocessing For Supercomputing Systems
2	Data Collection And Preprocessing For Multi-Website Web Log Mining
3	Research On Automobile Maintenance Data Collection And Integration Technologies For Industry Chain Collaboration
4	Research On The Method Of Social Network Data Collecting And Analyzing
5	Design And Archievement Of Liaoning Netcom Online Collection Complex Preprocessing System
6	Research And Development For RDWIS Based On Data Integration
7	Research And Evaluation System Of Data Preprocessing System Design And Implementation,
8	Design And Implementation Of A Stream Data Preprocessing And Service System
9	Research And Implement Of Data Integration Platform Based On SDO
10	Research Of Kernel Data Management Technology For Information Integration System