Font Size: a A A

The Depth Study Of Electronic Documents Sensitive Information Mining Technology

Posted on:2015-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:X Y XuFull Text:PDF
GTID:2298330467967169Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, whether it is storage, management, delivery methods and media storage of sensitive information have undergone great changes. First, the storage methods of sensitive information from a single paper to light disk, mobile HDD, USB flash disk and the host, such as laptop and desktop devices, gave sensitive information carry, dissemination, copy, etc. to bring the convenience, but the attendant security risks also intensified; Secondly, imperfect legal system, but also to manage sensitive information makes it difficult to manage aspects of the producer; there are large hard disk emerging, adds to the difficulty of detecting sensitive information. To solve the above problems, this paper studied the electronic documents sensitive information depth mining technology, not only to improve the ability of sensitive information from the depth of the excavation, but also improve the speed of sensitive information from the mining efficiency technologies.First, this paper introduces the research background and significance of research on the subject, and then summarizes the research status and problems at home and abroad on the subject, followed by further research needs listed in the content, organizational structure introduced at the end of this article.Secondly, nowadays the paper analyzes the NTFS file system format in very popular Windows systems, by analyzing the format of the NTFS file system, read directly files from disk and classify into different files category, providing file information for the next chapter of text parsing information extraction module.Then, this paper analyzes the compound document format and the PDF format, by analyzing the document format, the extract the contents algorithms of the text information is designed to perform the process of extracting the information content of text, and the extracted content according to different classification types, for the next chapter to provide sensitive information quickly given text content for the search. Finally, large-capacity hard disk to sensitive information mining technology brings the difficulty,we use processing performance multi-core processors platforms through the parallel loop searchpattern and the multi-threaded parallel search pattern, to improve the search speed of the sensitiveinformation. Finally, the search pattern, the parallel loop search pattern and multi-threaded parallelsearch pattern search speed compared, the effect is obvious, to prove the feasibility andeffectiveness of the proposed algorithm design.
Keywords/Search Tags:Sensitive Information, NTFS File System, Compound Document File Format, PDF File Format, AMP Heterogeneous Parallel Programming, Multi-core Processors, ParallelSearch
PDF Full Text Request
Related items