| Data is the core production factor in the new age,where data is flowing,shared and open among various industries.Due to the non-competitive nature of data,there are more and more data security risks such as data leakage and secondary dissemination and dumping in the flow and opening.When a database breach occurs,there is a need to conduct comprehensive incident traceability and recovery processing.The technical challenges include:efficient identification of sensitive data,there are many types of sensitive data,it is difficult to expand the existing sensitive data identification approaches,and it is impossible to do automatic intelligent identification,and the identification accuracy and identification performance is low;the problem of automatic adaptation of database watermarking algorithms,there are many types of existing watermarking algorithms,with a single function,and it is impossible to do effective intelligent adaptation of different types of data,it is also resistant to attack capability is weak.To address the above problems and challenges,this paper designs and implements a sensitive data masking and watermarking system for structured data,which mainly includes two core subsystems:data masking and database watermarking.The research contents and contributions of the full paper are as follows.(1)Research on the key technology of data masking based on personal sensitive information.To address the problem that existing sensitive data identification approaches are difficult to be extended and have low identification accuracy and identification performance,this paper proposes a sensitive data identification approach based on classification before identification architecture.It has been shown that the identification accuracy of the sensitive data identification approach based on classification before identification architecture can reach 97.14%and the identification performance can be improved by about 70%compared with the traditional regular matching-based sensitive data identification approach.(2)Research on the key technology of relational database watermarking based on automated adaptation.To address the lack of automated adaptation of database watermarking algorithms,this paper proposes an automated adaptation watermarking algorithm for numerical and text-based data for the first time,which combines data type adaptation,data volume evaluation,data column sensitivity judgment,parameter tuning and other techniques to achieve intelligent adaptation of database watermarking algorithms and model parameters.It is shown that the data distortion of the proposed algorithm in this paper is only 0.04%compared with the traditional least significant bit method;in the algorithm robustness analysis,the method in this paper can effectively resist the standard subset deletion attack,subset modification attack and subset increase attack.The results show that the proposed method has less data distortion and higher robustness,which can effectively solve the data validation and traceability problems.(3)The system of masking and watermarking of sensitive data for structured data is designed and implemented.The system has the core functions of sensitive data identification and masking,watermark embedding and extraction,which meet the security protection needs of sensitive data in circulation and can effectively solve the problem of masking and traceability of sensitive data,with high use value and application prospect. |