A Study Of Data And Model Marking In Data Sharing & Trading Scenario

Posted on:2023-07-08

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H Q Liu

Full Text:PDF

GTID:1528306902454534

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In the era of the Internet and the Intelligent Internet of Things,massive amounts of data have been gathered in the process of digitization and informatization.Deep learning models and algorithms based on massive data are also developing rapidly and iteratively.Data and deep learning models themselves have begun to be valuable and have gradually become intellectual property.In order to protect the intellectual property rights of data and models in data sharing and trading scenarios,researchers have made efforts in many aspects,and’marking’ is one of the important foundations.’Marking’means the technical ’mark’ of data and models to resist some non-compliance(copying,modification,illegal control,etc.)in data sharing and trading that threaten intellectual property rights,including embedding or extracting some identification messages and feature vectors(such as digital watermarks,hash signatures),and a process or mechanism for protecting intellectual property rights(such as Digital Rights Management,a secure and complete deep model inference mechanism with encryption and decryption).This thesis focuses on two types of ’marks’,namely,data fingerprints and model authorization control mechanisms,and conduct research in terms of hardware fingerprint,content fingerprint,and model parameters protection,respectively.The current research on hardware fingerprinting of mobile device motion sensor data lacks theoretical understanding of fingerprinting capacity,and further research is needed for more accurate fingerprinting of real-life data;For content fingerprinting of data,there is a lack of a content fingerprint generation and retrieval system implemented by a unified programming interface in the case of large-scale data;For model parameter protection,most of the current work is passive protection after models are stolen or infringed,and it is an important challenge to balance the trade-off between security and efficiency in the active protection of model parameters.In terms of data hardware fingerprint,(1)this thesis proposes a mathematical model based on the Ball-into-Bin problem to model the hardware fingerprint capacity of sensor devices with multi-dimensional features,and investigates the different effects on fingerprint capacity from three perspectives:the characteristics distribution of the device sensor data in multiple dimensions,the number of distinguishable granularities and the number of devices.We also use a large number of real mobile device data to analyze/validate the theoretical capacity model.(2)In the real-life situation,the user’s arbitrary activities will lead to a great noise of sensor data with the unstable hardware fingerprint,this thesis proposes an effective method for automatically capturing hardware fingerprints of device data based on Long-Short Term Memory neural networks.Compared with traditional feature engineering methods and convolutional-based deep neural network methods,the fingerprinting method in this thesis obtains better accuracy and is more robust on a wide range of large datasets.(3)This thesis also proposes a novel de-fingerprinting method based on the generative network model to anonymize sensor data.The sensor data is preprocessed before publishing to resist sensor fingerprinting attacks,and the preprocessing delay is minimized to meet the needs of real-time data release while maintaining good data utility.In terms of data content fingerprinting system,this thesis design and build a system for data content fingerprint extraction integrating text,image,video,and table data modalities and large-scale retrieval based on data content fingerprints.The method implemented in the system achieves more than 70%similarity after data modification,and the fingerprint retrieval accuracy is more than 90%.The fingerprint generation time is within 200 milliseconds,in particular,a better trade-off between generation efficiency and fingerprint accuracy is achieved for tabular modal data,and the retrieval time is less than 2 milliseconds under the scale of millions of data.In terms of protecting model parameters,(1)this thesis proposes the solution of carefully generating a secure version machine learning model with crafted random values and deploying the secure version model to edge computing devices in the service scenario where service providers migrate machine learning models to edge computing devices for inference.The design of the secure version model is based on the experimental observation that when a small number of model parameters are modified by random values,the model performance will be seriously degraded,which makes the secure version of the model is inaccurate and it is difficult for an adversary to infer which parameters were modified.(2)In order to obtain correct inference results,the result of incorrect inference caused by the modified parameters needs to be corrected back.This thesis proposes a correction scheme combined with the Trusted Execution Environment(i.e.Intel Software Guard Extensions,SGX)with the requirements of minimizing the inference computing scale in the edge device SGX.After evaluation on real datasets including CIFAR10/100 and ImageNet,the method in this thesis has a significant improvement in inference efficiency and a large reduction in memory usage compared to the state-of-the-art.

Keywords/Search Tags:

Data Fingerprint, Model Watermark, Deep Learning, Privacy, TEE

PDF Full Text Request

Related items

1	Research On Lossless Watermark And Integrity Authentication Of Deep Learning Model
2	Research On Training Method Of Image Data Based On Differential Privacy For Deep Learning Model
3	Research About Fingerprint Recognition System Based On Deep Learning And Homomorphic Encryption
4	The Study Of Data Privacy Protection Issues In Multiparty Deep Learning
5	Research On Privacy Protection Technology Of Deep Learning Model Based On Differential Privacy
6	A Study On User Data Privacy-Preserving Mechanism With Differential Privacy
7	The Research And Implementation Of The Layout Digital Watermark Based On The Fingerprint And Redundant Routing
8	Research On The Technology Of Defensing Privacy Hiding Faced To Deep Learning Models
9	Research On Distributed Deep Learning Technology Based On Model Averaging
10	A Research On Authentication Audio Watermark Algorithm Based On Deep Learning