With the rising of cloud computing,how to migrate services on the faulty nodes to available nodes quickly,flexibly and efficiently becomes an important problem in cloud computing cluster fault-tolerance.Since more and more services are deployed to cloud computing clusters,fault-tolerant mechanisms based on process migration and virtual machine migration hardly meet the requirements of fault-tolerance in cloud computing cluster.Container migration consumes less system overhead than virtual machine migration,and is more flexible and efficient than process migration.Accordingly,a container-based resource allocation modeld and a migration mechanism for user level fault-tolerance are proposed.First of all,fault tolerance of virtualization cloud computing cluster is analyzed.Based on the container virtualization,the physical machines for fault-tolerant are virtualized into a container fault-tolerant pool.And a kind of container fault-tolerant pool resource allocation process under task migration request is given.Complex fault-tolerant resource allocation process in the cloud computing cluster is decomposed into three sub-models: fault-tolerant resource assigning sub-module,fault-tolerant pool management sub-module and container provisioning sub-module.Described by the Markov processes,the three sub-models are integrated into a container fault-tolerant resource allocation integration model.Migration rejection rate and average recovery delay of migration request in the corresponding sub-models are analyzed.Secondly,based on container fault-tolerant resource allocation integration model,sub-models and traditional migration mechanism for fault-tolerant,this paper proposes a container migration mechanism,which contains a container fault-tolerant pool,a container checkpoint-restart method and a container checkpoint file transmission method.Container fault-tolerant pool completes the migrations of the containers for user level,increasing utilization ratio of fault-tolerant resources and cluster fault-tolerance flexibility.Container checkpoint-restart method preserves a full container-process hierarchical relationships and reduces the migration rejection rate.By using the remote memory access technology to achieve checkpoint files transfer,checkpoint file transmission delay is reduced under high I/O overload.Finally,the availability and effectiveness of the proposed model and method are verified by experiments in laboratory environment. |