With the rapid development of Internet technology, a lot of new Internet applications are emerging in an endless stream. Many traditional domains are gradually integrating with the Internet, such as the financial industry, the telecommunications industry, the retail industry and so on. Various types of data is growing exponential, which are increasingly demanding on the high performance of data storage. The improvement of querying and transaction processing are also more and more urgent. The traditional domains of mission-critical applications are under unprecedented pressure.To address this challenge, the way to improve the performance of the database sys-tems has transformed from the original scale-up manner to the scale-out manner, which results in the emergence of the distributed system. The distributed system organizes a plurality of physically separated machines connected via networks as a whole system. It approaches the availability and performance issues, but the data consistency is facing a challenge. In order to resolve the consistency problem, the distributed database system must replicate data among all the data partitions.We are concentrating on the log synchronization problems in a scalable data man-agement system and analysing their advantages and disadvantages. We also put forward the majority log synchronization strategy and the hierarchical log synchronization strat-egy. The former strategy aims at achieving a better tradeoff between the data reliability and the system availability in some specified area. The latter strategy aims at resolving the bottleneck of the log synchronization load in the master transaction node. We also put forward the alternative log synchronization strategy under different scenarios.We verify the characteristic difference between the different log synchronization and the rationality of the corresponding alternatives empirically.We made the following contribution works:◠Based on the formalization of the database consistency under classical log synchro-nization strategy state between the master cluster and slave cluster, we analyse the disadvantages and advantages and applicable scenarios from the perspective of the consistency, the availability and the restorability.◠We present the Majority log synchronization strategy and the Hierarchical log syn-chronization strategy. The Majority synchronization strategy ensures the efficien-cy of the synchronization procedure but also ensure the system consistency. It’s suitable for the scenarios that the network and nodes are prone to fault. While in Hierarchical log synchronization strategy effectively reducing the performance im-paction while the system scale-out by transfer the log synchronization load from the transaction nodes to a backup transaction node, it’s suitable for the scenarios that the transaction contains large amounts of data.◠We implement the four log synchronization strategies in this paper based on the open source system OceanBase, and after a large number of experimental results and analysis we validate the advantages and disadvantages of them. Experimental results show that in different application scenarios and consistency, recovery of demand, we can select the appropriate log synchronization strategy to meet the demands of the system.The log synchronization strategy is the common issues among the NoSQL systems and the NewSQL systems. The two log synchronization strategies we proposed is use-ful for get the tradeoff among some aspects such as the availability, the consistency, the restorability and so on. The corresponding analysis and the experiments result could help the database manager choose an appropriate log synchronization strategy according to the specified application scenarios. |