| Data works as the basis of the information system is the core of normal business operations. The horizontal and vertical development of business, new interactive mode, as well as the rapid development of mobile Internet makes data growth quickly. Services and management have gradually refined, information systems have gradually centralized. All these factors give birth to the large-scale data centers. Coupled with new technology architecture and generate of new business, the higher performance of the systems are requirements. Software upgrades is a main way to achieve this goal and the replacement of old and new systems will have to face in the process of data migration. There must be a lot of risks in the progress of data migration.This paper studies the data integrity issues in data migration process. Data integrity testing is an essential part of the data migration, and that makes in-depth study of data integrity has important significance.Firstly, this paper analyzes the data integrity based on fault tree. It clearly presents the risks factors in the progress and the structure importance of risk factors. By analyzing the structure importance we can focus on the important risk factors before data migration. We also can calculate the probability of the risk and the risk of data integrity. All of this can be the basic of feasibility analysis before data migration. The process is divided into three processes:firstly establish data integrity risks fault tree, and then analyze the structure importance of each risk factor based on the fault tree, and finally calculate the risk probability of data integrity.Secondly, we propose a new block method of data integrity and backtrace based on MD5 detection. This method can solve the serial computing problems of MD5 algorithm, and improve the performance and speed of data integrity testing. This can also go back to the wrong data blocks according to each sub-block label and to reduce the time of data investigation. In this method, the large data file is divided into blocks according to preset block size PS. Then select the first 100 bytes of each block as the label links on this block. The method can also effective resolve the collision problem of MD5 through multiple MD5 digest calculation.Finally, the method of data integrity based on MD5 block detection is applied into practice. This can verify the reliability and feasibility of the method. In this part we can show the result of the test.The study of this paper can be a very important basic of risk aversion plan and is able to ensure the smooth implementation of data migration projects. |