Font Size: a A A

Methodology Of Full-text Retrieval And Identification Against Illegally Revealed Sensitive Files

Posted on:2015-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q XuFull Text:PDF
GTID:2298330431984681Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Sensitive documents are important documents with sensitive information, such as offical secrets, commercial secrets and other information. Incorrect use can sometimes lead to the disclosure of sensitive information, resulting in serious losses of state property, and other commercial interests. With the continuous development of science and information technology, as well as an increase in the number of documents to be confidential, confidentiality of sensitive documents is facing a more severe situation. The primary goal of the secret work is to identify sensitive documents effectively and to prevent leakage incidents that may occur.Aiming at this problem, we propose a system for rapid detection and identification of design files with sensitive information from the file content and format of multiple perspectives. The system uses a full-text index files irregularities outreach intercepted from the network based on Lucene full-text retrieval strategy which provides a search interface for the users to achieve a full-text retrieval capabilities based on keywords, and it also can realize the user manual input sensitive keywords to identify sensitive documents; the system also provides a custom sensitive dictionary. It can identify sensitive documents by matching the content of the file and the word in the dicitionary which is edited by the artificial setting; in order to fully identify sensitive documents, the system according to the synonym sensitive dictionary Tongyici Cilin to custom specific format extension, on the basis of the documents it can preliminarily and fastly identify the sensitive documents.In addition, due to Word and PDF documents to be more and more popular for text file storage, this system designs the different identification scheme for these two kinds of storage modes of electronic document:text digital watermarking identification scheme for Word documents and MD5digest authentication for PDF document identification scheme. From the file contents and format of this multi-angle, it is possible to identify sensitive documents fast, smartly and effectively among many files with multi-program identification method.
Keywords/Search Tags:Identification of sensitive documents, Sensitive dictionary, Synonymexpansion, Text watermark, MD5digest authentication
PDF Full Text Request
Related items