Font Size: a A A

General Cloud-native Big Data Architecture With Kubernetes

Posted on:2022-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:S DuFull Text:PDF
GTID:2518306773997719Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the continuous expansion of data volume,the performance of traditional data processing technology is gradually unable to meet the demand.In order to solve the challenges brought by data growth,various data processing technologies have been gradually developed,which has improved the data processing ability to a certain extent,but it has brought the problems of technical silos,scattered data,complex architecture and difficult maintenance,resulting in increasingly high data cost.With the vigorous development of cloud computing technology,"everything goes to the cloud" has become the normal state of the era.Not only ordinary systems could take advantage of the cloud to improve their overall capabilities,but data processing technology can moreover rely on the cloud to improve the performance to reduce the costs.Although some new data processing technologies,take cloud native as cornerstone,combine various technical means,integrate the transaction processing and analytical calculation,and provide a unified access interface,but as they are still in the early stage of development,and limited by many factors such as infrastructure and technological development,they can't completely replace the traditional data processing technologies in a short term.Combining traditional data processing technology with cloud,in this way,the performance can be improved,the cost can be reduced.This is one of the main methods to reduce data inflation challenge,lower system complexity and obtain data cost savings.But there is not much related work on this research.The integration of data processing technology and cloud usually needs special transformation,which cannot be generalized.The facts that the cloud storage performance is not high,the traditional storage is inflexible,and the overall complexity is high,make it difficult to make good use of the advantages of cloud technology.In order to solve these problems,this paper proposes a general primary cloud big data architecture based on Kubernetes.1.Based on mature cloud container orchestration technology,the modular and low-coupling primary cloud operating environment is designed,which improves the efficiency of data processing technology on the cloud.2.Based on cloud storage,data acceleration middleware,container storage interface and other technologies,the cloud storage acceleration strategy is designed,which realizes the structure of separation of storage and calculation.This makes the data system on the cloud have strong scalability and flexibility,improves the data storage scale and reduces the data cost.3.By introducing the new parallel computing strategy,the defects of traditional large-scale parallel computing are avoided,the overall query performance is improved,and TPS response is more stable.4.By introducing the mature technologies of logging,monitoring,tracking etc.the standard observability strategy of large-scale cloud resources is designed,which improves the system robustness and reduces the maintenance complexity.Finally,on Click House,the analytical data processing technology,the architecture proposed in this paper is verified.Through a large number of experiments come from aspects of query performance,TPS response,etc.compare with the original architecture,the result shows that the query performance is improved by 18%?60%,and the data cost is reduced by 50%?90%.It is superior to the original architecture in data scale,consistency,reliability and system scalability,and is easier to operate and maintain.
Keywords/Search Tags:Big Data, Cloud Computing, Integration of Big Data and Cloud, Separation of Storage and Calculation, Massively Parallel Computing, Observability
PDF Full Text Request
Related items