Font Size: a A A

Research On Structured Data Watermarking Based On Classification

Posted on:2022-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:P RenFull Text:PDF
GTID:2518306605970689Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In the context of the big data economy,the frequency of data sharing and distribution continues to increase due to the needs of data users.Data sharing and distribution promote the economic development of big data and bring convenience to production and life,while ensuring the secure sharing of private data and the confirmation of copyright after structured data leakage are two key research issues in the development of digital economy.The purpose of this article is to Through structured data watermarking technology,the traceability of data copyright is realized,and the rights and interests of copyright owners are protected in the process of data circulation.The existing research carrier of big data watermarking technology focuses on numerical data.It realizes the embedding of watermark information by finding the redundant position of the original data and slightly adjusting the data value,simulating tuple attacks on structured watermark data,and increasing the proportion of tuple attacks,Determine the robust performance of the algorithm.Based on the existing research,this thesis focuses on the classification-based structured data watermarking to ensure the availability of watermarked data while the algorithm has high robust performance.The specific work is as follows:1.Propose a structured watermarking system based on classification.First,introduce the function and module relationship of the three modules of the system,and then combine the characteristics of structured data to integrate an attribute classification algorithm based on sequence labeling to identify attribute columns.Different from the traditional database watermarking scheme,it directly focuses on structured numerical data while adding structured non-numerical data to the research scope.The attribute recognition algorithm combines the hidden Markov model and Viterbi algorithm in machine learning,draws on the Chinese natural language part-of-speech tagging tool,and realizes the recognition of structured numerical and non-numerical attribute columns by introducing a trigger vocabulary,which can ensure the optimal watermark information Embedding level.The accuracy of this attribute classification algorithm is better than the existing common entity classification probabilities.2.Propose a structured text watermarking algorithm based on classification.In view of the different text types of attribute columns,the common text attribute column types of structured data are selected,and the redundancy level of common attribute columns is defined by using Chinese natural language processing and long text watermarking technology.The word segmentation tool and the word segmentation tool are used in the watermark embedding stage.The professional domain dictionary library established manually realizes the semantic level embedding of text tuples,ensuring the optimal embedding of watermark information of different text attributes,and is extremely safe and concealed.In the robust performance detection experiment,the text watermark algorithm has good anti-attack performance.3.Propose a structured numerical watermarking algorithm.In view of the difference in the amount of carrier data,the algorithm is subdivided into numerical watermarking algorithms based on large data amount and small data amount.According to the difference in data amount,determine whether to occupy the carrier data space for header information,error correction level,and embedding strength related flags.Embedding of information.Numerical watermark algorithm based on large data volume can quickly locate watermark information through header information in the process of detecting watermark.In the process of detecting watermark based on small data volume,numerical watermark algorithm needs to use floating-point data for matching watermark information detection.The above two algorithms can guarantee data availability in terms of objective statistical characteristics,and the corresponding algorithm needs to be selected considering the amount and type of the original structured data,and the two algorithms have high robustness.
Keywords/Search Tags:classification, structured data, database watermark, copyright confirmation
PDF Full Text Request
Related items