Font Size: a A A

The Technique Of Streaming Data Masking With Support Of Data Mining

Posted on:2018-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:T B JiangFull Text:PDF
GTID:2348330512484905Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the vigorous development of big data,the demand for data mining and analysis is increasing day by day.How to protect sensitive data from being leaked when publishing data becomes a research focus.Data masking technology can effectively solve this problem by preserving the original format of the data while changing its value,protecting the sensitive data from unauthorized access.Most existing data masking research based on the static data set,not streaming data.At present,real-time streaming data analysis has become a research hotspot,this paper's goal is designing a data masking solution under real-time streaming data environment to protect sensitive data from leakage.This paper proposed a data masking procedure for streaming data based on the research of its demands.it mainly includes three aspects: data anonymization,data consistency processing and data masking.this paper introduced data anonymization technology to data masking procedure to deal with the identity masking part.based on existed data stream anonymization algorithms,this paper proposed an algorithm,which reduced information loss and fixed the format mismatch issue by continuously optimizing the published k-anonymized clusters and correct the data format after data generalization.this method can protect the user's identity information while maintaining the availability of data.In the meantime,this paper proposed an general masking algorithm based on regex expression.The algorithm utilizes the versatility of regular expression to transform the data based on finite automata theory to ensure the the format of generated data matches regular expression given by user.we also discussed the data masking algorithm for naming data and address data.At last,we implemented the algorithms proposed in this paper on the distributed streaming data processing platform Storm,this implementation utitlized cuncurrency and scalability of Storm platform to boost the performance of the system.this system was tested by electrical business transaction data.it can be seen from the experiment result that the data masking solution proposed in this paper is efficient and effective.
Keywords/Search Tags:Data masking, Data anonymnization, Streaming data processing
PDF Full Text Request
Related items