Font Size: a A A

Design And Realization Of Privacy Data Leak Detection System

Posted on:2022-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z X LiFull Text:PDF
GTID:2518306725484294Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In the era of Big Data,users' personal data has enormous economic value and the associated security issues are becoming more serious,with organisations in various countries introducing appropriate policies to protect personal privacy.In order not to violate privacy regulations,companies need to undertake data compliance efforts.The foremost task of data compliance is to locate and analyse the risk of privacy data leakage in software applications,making automated privacy data leakage detection technique a key concern for organisations today.Current implementations of privacy leak detection technologies focus on the detection of leak locations,but generally do not focus on whether data has been anonymised,and may also misrepresent the risk of leaks for data that has been desensitised,resulting in increased workload and maintenance costs for data compliance.This thesis aims to provide accurate and complete privacy data distribution results,improve the detection of anonymised privacy data,and reduce the false alarm rate of privacy data leakage risk by traditional methods.This thesis utilises taint analysis,program graph representation techniques and machine learning classification models to provide development or security engineers with a more accurate detection service for program privacy data leakage risks,thereby reducing the workload of data compliance and ensuring all-round monitoring and protection of privacy data in software systems.This thesis thus proposes a Privacy Data Leak Analysis(PDLA)method,which firstly obtains the propagation path of user privacy-related data variables from external inputs in the project program based on taint analysis techniques; secondly,extracts the possible cryptographic functions from the function calls in the propagation path of privacy data variables;then extracts the set of Then,for the candidate functions,the function features are extracted using program representation learning techniques and combined with graph kernel functions to generate the feature vectors of the functions; afterwards,the candidate functions are classified and identified using machine learning classification models to improve the accuracy of the privacy data leakage risk assessment and generate the final scan detection results.This thesis designs and implements a privacy data leakage risk detection system for Java Web projects based on the proposed approach,which uses Spring Boot,Django,Soot,React and other technologies and architectures to complete the implementation of the system.Experiments show that this system can achieve an accuracy rate of over88% at the expense of a smaller recall rate,and reduces the false alarm rate by 20%compared to the traditional privacy risk detection tool Find Security Bugs.In summary,the system in this thesis can provide a comprehensive privacy data distribution and give more accurate risk detection results,thus saving maintenance costs and improving data compliance efficiency.
Keywords/Search Tags:Program Analysis, Taint Analysis, Machine Learning, Program Understanding, Private Data
PDF Full Text Request
Related items