Design And Implementation Of Multi-source Big Data Processing And Analysis Platform

Posted on:2021-04-01

Degree:Master

Type:Thesis

Country:China

Candidate:C J Huang

Full Text:PDF

GTID:2428330614472646

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet,all industries have been rapidly developed,which has also accelerated the arrival of the era of big data.Regardless of the size of the enterprise,there is a challenge in using data,that is the increasing amount of enterprise data.Therefore,it is necessary to manage the data to ensure easy use of high-quality data and ensure that enterprises can extract effective data information faster.At present,the data files collected by many companies are relatively primitive and confusingly managed.They are often stored in different storage structures with different structures,and these data files have large data volumes,many data files,complex formats,and chaotic content.It is difficult to evaluate the value of data,it is difficult for enterprises to quickly obtain useful information from the data,it is difficult to form effective business applications,and it is not possible to sort out the business logic related to these data well.Therefore,there is an urgent need to process these original data,enhance the value of data applications,solve data islands and other issues,and provide a solid data foundation for subsequent business applications.According to the existing problems,the solution provided in this paper is a multi-source big data processing and analysis platform,designed to help companies organize chaotic and scattered data into clear and traceable high-quality data,and help companies sort out data relationships,Mining data information.The platform is based on a microservice architecture,and the background uses the Spring Cloud framework for development.Each functional module of the platform is a relatively independent microservice module,which can ensure that each different service is pluggable.And to ensure the robustness and scalability of the entire system.At the same time,the zuul gateway is used for authorization authentication to ensure the security of service invocation.In the data processing part,the spark cluster is mainly used for fast data processing and analysis.In the process of iterative development of the system,combined with Gitlab and Jenkins for continuous integration and continuous deployment,to ensure rapid integration and iterative deployment of the system.The function points of the system mainly include data standard management,data cleaning,data integration,data quality audit and metadata management.This article will detail the design and implementation of each module of the platform from the aspects of requirements analysis,system design,system implementation and testing.In the development process of the entire project,I participated in the platform's early stage demand analysis and system design,and then participated in the background Java code writing of the platform function module,and was responsible for the continuous integration and continuous deployment of the platform,and later participated in the platform testing and online deploy.The projects in this paper are already online and in the beta testing phase,and have provided data management services to some financial companies.At present,the system can provide data management services for enterprises normally,and meet the expected requirements in terms of security and robustness.

Keywords/Search Tags:

Spring Cloud, Big Data, spark, Continuous Integration

PDF Full Text Request

Related items

1	The Design And Implementation Of Data Integration And Analysis Model Of Yitian Data Management System
2	Research And Application Of Continuous Integration In The Development Of Cloud Platform Of A Company
3	Application Migration And Big Data System Deployment On Cloud
4	Research Of Continuous Integration Based On Cloud Computing
5	The Research And Implementation Of Spark And NoSQL Databases Integration
6	The Research And Practise Of Software Continuous Delivery Platform
7	Design And Implement Of Container Cloud Platform Based On Kubernetes
8	Research On Spark Data Skewing Improvement And Decision Tree Parallelization Application Under Cloud Edge Collaboration
9	Research On Cluster Analysis Of Biomedical Patent Data In Yunnan Province Based On Spark Cloud Computing Architecture
10	Research On Optimization Mechanism Of Containerized Spark Resource Scheduling In Cloud Environment