Font Size: a A A

Research On Data Provenance System Based On Flink Platform

Posted on:2020-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y Z WangFull Text:PDF
GTID:2428330620460069Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,due to the popularity of the Internet and the increase in the number of Internet users,the amount of data generated by people's network activities has also exploded.Big data provides great benefits and value for the development of society,but at the same time big data also poses new challenges for information security.Especially in some enterprises and various organizations,how to ensure that network traffic is safe for a large amount of data flowing in and out is a very important issue.Data provenance technology is such a technology that traces the ins and outs of data,which is also very helpful for data protection and confidential information flow control of various organizations.Data provenance is a relatively new field of research,mainly to record the transmission of specific data,and to do a traceable service function after the transmission.As an important function of enterprise information security control,it has always faced the embarrassment of conflict with high management costs.Therefore,this paper proposes a new data provenance algorithm,which attempts to recover the internal content of the enterprise and recover the content of the document through the protocol restoration algorithm.After archiving,the propagation path is recorded according to the result,thus eliminating the bottleneck of traditional data provenance technology.In order to adapt to the big data era,in which the data volume and the throughput is large,the algorithm is migrated to the big data stream computing platform Flink.Relying on Flink's excellent distributed features,flexible scheduling,configuration and scalability ensure stable and reliable data provenance process.In the process,the traffic agent data is collected by setting the packet capture agent at the key node,and the data is delivered to the stream processing system through the message middleware.The stream processing system is first responsible for restoring the file,and finally the file is handed over to the feature extraction module.After the feature extraction module completes the analysis of the file,it falls to the storage and waits for the data provenance request to be compared.
Keywords/Search Tags:data provenance, Flink platform, big data, stream computing
PDF Full Text Request
Related items