Font Size: a A A

Research And Implementation Of Batch Stream Fusion Data Processing Support Environment Based On Kubernetes

Posted on:2022-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:2518306494471424Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At present,batch stream fusion data processing is a hot spot in the field of big data research and application.Batch stream fusion data processing involves batch big data processing,streaming big data processing and batch stream mixed data processing,which all put forward higher and more complex requirements for operation support environment.However,the traditional data support environment is mainly based on the physical server and cloud server to realize the integration and use of data processing engine tools.It has large resource cost in deployment agility,high efficiency of resource utilization and real time of service state.The data processing requirements of batch stream fusion aggravate the above problems.The traditional supporting environment for big data processing has been unable to effectively cope with the requirements of data processing.Containerization technology,as a new lightweight virtualization technology in recent years,provides a new channel for researchers and developers to build distributed applications.In the process of rapid innovation,containerization technology also provides a good support for batch stream data processing services.However,big data containerized services still have some shortcomings in container cross-host communication,container scheduling and other aspects,and there are many challenges in building data processing supporting environment.In this thesis,the virtual container is used as a platform to deploy container services in batch stream data processing support environment,and Kubernetes is used as a basic tool to make the container management and arrangement more reasonable.Deploy,operate and maintain the data processing engine required by developers through the front-end visual interface.Batch stream fusion data processing to support the container services deployed in the environment can greatly improve the resource utilization of service nodes and the R&D efficiency of experiment and researchers,and be able to reasonably schedule container services.In view of the above problems,the main work of this paper is as follows:(1)Containerized batch stream fusion data processing supporting environment architecture is designed.Firstly analyzes the batch flow fusion processing applications to support environment of configuration management,the execution environment adjustment and resource load requirements as well as the mainstream of the batch,stream processing framework container after communication and monitoring requirements,puts forward the batch flow fusion data support to the container environment layered architecture and the corresponding container service,The cross-host communication mechanism and the container state monitoring mechanism supporting the environment are designed.(2)Aiming at the dynamic demand for resources in the process of data processing of batch stream fusion data processing service,a custom scheduling algorithm for container resources in the supporting environment of batch stream fusion data processing was proposed.According to the actual situation,a constraint model for large-scale container scheduling and a fitness model for evaluating the effectiveness of the scheduling algorithm were constructed.The simulation experiment platform Cloud Sim is used to verify the effectiveness of the scheduling algorithm in reducing the load of resources such as CPU and memory on the server node.Ensure that the containerized services deployed by the supporting system can be quickly deployed to the corresponding server nodes.(3)Based on Kubernetes,the containerized batch stream fusion data processing support environment is implemented.The implementation of the core modules of the support environment,such as communication,monitoring and scheduling,is introduced in detail.In addition,I also set up and deployed the physical server cluster,and conducted tests through a number of projects related to laboratory services,Internet and power grid big data,providing a supporting environment for big data processing for project research and development and testing.The related application results show that the supporting environment has a good effect in the aspects of rapid deployment,service state monitoring and controlling the load of resources such as CPU and memory on the server node.
Keywords/Search Tags:support environment, batch stream fusion, containerization, container schedule
PDF Full Text Request
Related items