| In the era of the Internet and the Intelligent Internet of Things,massive amounts of data have been gathered in the process of digitization and informatization.Deep learning models and algorithms based on massive data are also developing rapidly and iteratively.Data and deep learning models themselves have begun to be valuable and have gradually become intellectual property.In order to protect the intellectual property rights of data and models in data sharing and trading scenarios,researchers have made efforts in many aspects,and’marking’ is one of the important foundations.’Marking’means the technical ’mark’ of data and models to resist some non-compliance(copying,modification,illegal control,etc.)in data sharing and trading that threaten intellectual property rights,including embedding or extracting some identification messages and feature vectors(such as digital watermarks,hash signatures),and a process or mechanism for protecting intellectual property rights(such as Digital Rights Management,a secure and complete deep model inference mechanism with encryption and decryption).This thesis focuses on two types of ’marks’,namely,data fingerprints and model authorization control mechanisms,and conduct research in terms of hardware fingerprint,content fingerprint,and model parameters protection,respectively.The current research on hardware fingerprinting of mobile device motion sensor data lacks theoretical understanding of fingerprinting capacity,and further research is needed for more accurate fingerprinting of real-life data;For content fingerprinting of data,there is a lack of a content fingerprint generation and retrieval system implemented by a unified programming interface in the case of large-scale data;For model parameter protection,most of the current work is passive protection after models are stolen or infringed,and it is an important challenge to balance the trade-off between security and efficiency in the active protection of model parameters.In terms of data hardware fingerprint,(1)this thesis proposes a mathematical model based on the Ball-into-Bin problem to model the hardware fingerprint capacity of sensor devices with multi-dimensional features,and investigates the different effects on fingerprint capacity from three perspectives:the characteristics distribution of the device sensor data in multiple dimensions,the number of distinguishable granularities and the number of devices.We also use a large number of real mobile device data to analyze/validate the theoretical capacity model.(2)In the real-life situation,the user’s arbitrary activities will lead to a great noise of sensor data with the unstable hardware fingerprint,this thesis proposes an effective method for automatically capturing hardware fingerprints of device data based on Long-Short Term Memory neural networks.Compared with traditional feature engineering methods and convolutional-based deep neural network methods,the fingerprinting method in this thesis obtains better accuracy and is more robust on a wide range of large datasets.(3)This thesis also proposes a novel de-fingerprinting method based on the generative network model to anonymize sensor data.The sensor data is preprocessed before publishing to resist sensor fingerprinting attacks,and the preprocessing delay is minimized to meet the needs of real-time data release while maintaining good data utility.In terms of data content fingerprinting system,this thesis design and build a system for data content fingerprint extraction integrating text,image,video,and table data modalities and large-scale retrieval based on data content fingerprints.The method implemented in the system achieves more than 70%similarity after data modification,and the fingerprint retrieval accuracy is more than 90%.The fingerprint generation time is within 200 milliseconds,in particular,a better trade-off between generation efficiency and fingerprint accuracy is achieved for tabular modal data,and the retrieval time is less than 2 milliseconds under the scale of millions of data.In terms of protecting model parameters,(1)this thesis proposes the solution of carefully generating a secure version machine learning model with crafted random values and deploying the secure version model to edge computing devices in the service scenario where service providers migrate machine learning models to edge computing devices for inference.The design of the secure version model is based on the experimental observation that when a small number of model parameters are modified by random values,the model performance will be seriously degraded,which makes the secure version of the model is inaccurate and it is difficult for an adversary to infer which parameters were modified.(2)In order to obtain correct inference results,the result of incorrect inference caused by the modified parameters needs to be corrected back.This thesis proposes a correction scheme combined with the Trusted Execution Environment(i.e.Intel Software Guard Extensions,SGX)with the requirements of minimizing the inference computing scale in the edge device SGX.After evaluation on real datasets including CIFAR10/100 and ImageNet,the method in this thesis has a significant improvement in inference efficiency and a large reduction in memory usage compared to the state-of-the-art. |