Font Size: a A A

Research And Implementation Of High Performance And High Availability Data Center Synchronization Software

Posted on:2018-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:G B ChenFull Text:PDF
GTID:2348330521450921Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The rapid development of the computer network makes the data transmission become more and more easily and quickly,which create a good condition for the development of distributed data backup system.But if the data backup system encountered a power outage,natural disasters or other sudden conditions,it will be difficult to recover.This may make the company or business suffer heavy losses.The data centers in this article,as part of the international GNSS Monitoring & Assessment System(i GMAS),use Off-site storage.The data center receives data from multiple tracking stations and product synthesis and service centers,respectively.Due to network reasons or some sites only send data to a data center,resulting in the storage files of three data centers are not consistent.The first phase of data center uses the Rsync service for data synchronization.Although the Rsync service can solve the synchronization problem,but the synchronization efficiency does not meet the needs of the data centers.In order to ensure the reliability of distributed data backup system,it is very important to study how to carry out remote data synchronization efficiently.In this paper,through the data center file characteristics and network connectivity characteristics,redesign the data center synchronization software.The data center synchronization software's efficiency has been significantly improved than sync service.In the process of designing the synchronization software,we first analysis the demand completely,then deeply study the file characteristics stored in the data center,the storage mode of the files,the time and frequency of synchronization,and the network connection between the data centers.Through the demand analysis,the core algorithm of the synchronization software is designed.The algorithm needs to perform efficient and reliable data synchronization of the three data centers in the shortest possible time.The algorithm considers the bandwidth for data transmission according to the data center network connection characteristic.Reduce data transmission repeatedly.Simultaneous synchronization software also has a high availability to recover synchronization work,in a data center downtime or the thread is killed.The synchronization efficiency needs to be significantly higher than the Rsync service.We considers three aspects of the Synchronization software's efficiency to improve.The first point is based on the characteristics of the data center file(the file will not be modified,the file name contains data and unique).It is reducing the unnecessary MD5 detection,because the data of the data center comes from the tracking station or product synthesis and service center data.The accuracy of these data is proven,and the data center only provides data storage and backup functions,will not modify the file.So the name of the file can determine a file.The second point is based on the data center network connection characteristics.We focus on optimizing the data transmission lines.Each two of tht three data centers are connected.if a data center needs some data,it can get the data through two lines.The same one data center can also transmit data through two lines to the other two data centers.The third point is to simulate the FTP transmission mode,the data transmission and transmission control information sent separately to improve the efficiency of data transmission.According to the design algorithm,the synchronization software is realized,and the multi-core and high-performance features of the server are considered.The multi-thread programming is used to improve the transmission speed.The blocking queue reduces the coupling between the producer thread and the consumer thread.Paxos algorithm is used to improve the high availability of software.Finally,this paper validates the validity of the synchronization algorithm through effective test.The synchronization software have the function of data synchronization,and has a significant performance improvement compared with Rsync service.When the test case is a folder which size is 1.1GB,the folder includes the size from 4k to several megabytes of different files.In the normal network environment conditions,the synchronization time can be reduced by 25%.If one of the communication lines is busy or the speed is limited,the synchronization time can be reduced by 40%.The current data center synchronization software has been deployed in three data centers on-line.
Keywords/Search Tags:synchronization algorithm, Rsync, transmission line, FTP, MD5
PDF Full Text Request
Related items