Font Size: a A A

Sensitive Content Prevention And Control Technology Based On Network Platform

Posted on:2020-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q MaFull Text:PDF
GTID:2428330590996440Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In the era of "Internet +",the network has played an important role in people's work and life,and various types of network security issues have become more and more prominent.The amount of data is growing rapidly with the common development of information technology and social economy.With the increased data collected by enterprises and institutions,the data security of sensitive information has become a critical issue for the harmonious development of individuals and society.At present,related sensitive information security prevention and control technique mainly focuses on the content platform,the content server and the user terminal in network,considering that the Word and PDF documents often contain sensitive information about individuals and enterprises,without effective control mechanism,there is a high risk of information leakage during the document exchange.Current sensitive information protection technique are mostly server or user-based,and the content protection is mostly file type specific,such as the detection and protection of text-sensitive information,the identification and prevention of sensitive images,the identification of sensitive video information.Because the images in the document also contain important information,for instance,the engineering design images and ID images.However,there is few research on the identification and detection of image information embedded in documents transmitted in the network.Motivated by the emerging requirements for the proliferation control of sensitive information in documents,in this thesis we focus on the content analysis and desensitization techniques of online documents(such as Microsoft Word documents and PDF documents).First of all,this thesis studies document parsing techniques for in Microsoft Word and PDF files.Based on the PDF,DOC and DOCX file parsing methods,the parsing realization of the three types of documents is presented.On the basis of the text-content analysis and desensitization,this thesis focuses on method the image data embedding method in the document,and presents the image extraction method.At the same time,the image recognition and classification algorithms are investigated.Combined with the methods and techniques of image data desensitization,the realization of the sensitive information desensitization of text-content and image in these three types of documents are put forward.In the end,based on the reverse proxy mechanism,this thesis proposes a technical solution for the desensitization of sensitive information in documents transmitted in the network.This thesis highlights the framework design and implementation of the system,and introduces the modules of content parsing and desensitization,TCP reverse proxy and HTTP protocol parsing.Through practical testing and analysis,it shows that the proposed desensitization scheme of sensitive information in the transmission of documents provides us a promising technical solution to fulfill the practical text-content and image sensitive information parsing and desensitization applications.
Keywords/Search Tags:Content prevention and control, Document analysis, Data masking
PDF Full Text Request
Related items