| With the development of cloud computing technology and the increasing adoption of cloud computing in practical production environments to meet business needs,more and more enterprises are choosing to leverage cloud computing technology.Among various cloud computing technologies,OpenStack has gained widespread recognition for its open-source,flexible,and stable characteristics,and it is widely used by enterprises for building cloud platforms.However,OpenStack has limitations in terms of its availability and lacks automated fault detection and handling.When servers or service components fail,it can result in the unavailability of the entire service,causing significant losses to enterprises.In this thesis,an OpenStack high availability cluster is implemented to address the availability limitations of OpenStack and provide automated fault detection and handling.The cluster is divided into control,compute,and storage clusters based on functionality,making the cluster easier to manage and scale.Firstly,addressing the availability limitations of OpenStack,Pacemaker and Corosync are used for cluster resource management.To reduce false detection of heartbeat messages in Corosync,heartbeat detection is performed on both the management network and the storage network,enabling link failure detection and fault reporting.Resource agents are designed and implemented for services in the control cluster,providing functions such as service status detection,fault reporting,and automatic recovery.To handle compute node failures in OpenStack that result in the unavailability of virtual machine instances on the failed node,a monitoring module and a fault migration module are implemented to autonomously migrate virtual machines from the failed node to healthy nodes.Furthermore,to ensure the reliability of data storage and address the scattered storage and management difficulties in OpenStack storage components,cluster virtual machine image and unified disk storage are implemented,and virtual machines are run on Ceph RBD,allowing the compute cluster to focus more effectively on providing service resources.Finally,a dynamic virtual machine scheduling strategy is designed and implemented to regulate the workload on compute nodes,avoiding performance degradation due to high load or resource waste due to low load.Through testing,the designed and implemented OpenStack high availability cluster in this thesis can automatically detect and handle node-level and service-level faults,providing stable and reliable services as well as secure and dependable data storage. |