The Design And Implementation Of Business Service System For Log Analysis Based On Big Data Technology

Posted on:2019-07-23

Degree:Master

Type:Thesis

Country:China

Candidate:J W Gu

Full Text:PDF

GTID:2428330566986574

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In the era of big data,various business services require scenes based on logs for the purpose of ensuring health status or digging deeper values.In the face of massive log data that has been continuously generated and growing exponentially,traditional data processing and analysis technologies are hard to meet specific performance requirements in computing and querying services.The distributed and parallel big-data technology can give full play to the multi-machine and multi-core hardware resources,and has gradually been favored by the academic and industrial circles in the field of log service analysis.Firstly,log data usually has time-series and flow characteristics,and has certain attribute meanings.Secondly,in the construction of business process,the phase and relevance of business processing can reflect the corresponding relationship between task flow and data flow at the bottom.In addition,the agile development and production deployment of big data projects has always been one of the most concerned issues or problems of organizatio ns or enterprises.In order to process and manage massive log data,as well as develop specific business applications efficiently,for the sake of performance and generality,this thesis designs and implements a business service system for big data analysis based on the distributed computing framework Spark.The major contributions are as follows :(1)According to the characteristics of log generation,access and processing,this thesis proposes a hierarchical system architecture and designs three functional modules DSService,SparkServer and MonitorServer with low coupling and support distributed services.The communication and invoking methods between each layer of the architecture,function modules,and services are designed to support task flow or workflow management and scheduling,and to provide fault-tolerance,high-efficiency,and scalability guarantees for each service.(2)With Spark DataSet,this thesis unifies the application pattern of big data batch processing and streaming processing,forms a business workflow system that constrasts data flows and task flows,and at last realizes a unified development and deployment mode that supports data pipeline modeling.(3)Integrated SDK is provided to shield the underlying complex operations as well as support service registration and discovery,disaster tolerance and system monitoring.Combined with the management platform,this thesis provides users with integrated business application design flows for data access,development,deployment,and visualization,thereby facilitating the rapid integration and implementation of business service applications.Finally,this thesis conducts three benchmark tests: data access,task calculation,and data query.It shows that the basic big data services provided by the system have excellent performance and scalability.And then through two specific business applications,it verifies the versatility and practicality of the system in analyzing business services in big data logs.

Keywords/Search Tags:

Spark, distributed services, data pipeline modeling, Integrated SDK

PDF Full Text Request

Related items

1	Research On GIS Services Chain Integrated Model And Realization Based On Workflow
2	Design And Implementation Of A Distributed ETL Tool Using Spark
3	Research On Big Data Distributed Storage Technology Based On Spark
4	Research And Application Of Distributed ETL Based On Spark
5	Design And Implementation Of Distributed Data Mining Algorithms Based On Spark
6	A System For Distributed MD Data Analysis Based On Spark
7	Research On MUSER Data Processing Pipeline Based On Flow Calculation
8	Design And Implementation Of A Distributed And Real Time Video Stream Data Processing Platform Based On Spark
9	Design And Implementation Of A Distributed Hybrid Index Structure Based On Spark
10	The Research And Implementation Of Mining Large Data Based On Spark