Font Size: a A A

Application Of Document Similarity Detection In Enterprise Document Leakage Prevention

Posted on:2018-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z X LiaoFull Text:PDF
GTID:2428330569485418Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the scale of information technology applications of enterprise continue to expand,lots of confidential sensitivity data will be produced.At the same time,the disclosure of data is becoming increasingly common.How to inspect leakage of enterprise data becomes a problem.Therefore,the document similarity detection in the direction of enterprise data leakage prevention is very important,for the document information that internal staff sent outside,check whether the document contains sensitive information can guarantee the staff within the enterprise will not send sensitive or confidential information to the outside the company.Besides,it will help enterprises to protect internal confidential documents,improve the competitiveness of enterprises,and avoid significant losses that brought by the leakage of internal documents.Enterprise data leak detection algorithm based on document similarity detection will combine duplicate document detection and sentence similarity detection,with the method of digital fingerprint,inverted index and sentence similarity calculation based on word semantic to realize the enterprise data leak detection algorithm.In this algorithm,the text is mapped into a digital fingerprint information,and the text can be judged by comparing the Hamming distance of two digital fingerprints.In addition,in order to facilitate the detection of sentence similarity at the sentence level,the algorithm is used to construct the inverted index file.Finally,this paper uses the Chinese word knowledge to calculate word similarity,and then calculate the similarity of words,and then calculates the sentence similarity.Finally,enterprise data leak detection algorithm based on document similarity detection is tested on test data sets after realization,and the test results show that the algorithm can effectively detect common leak of confidential data.
Keywords/Search Tags:Document leakage prevention, Similar document detection, Inverted index, Sentence similarity
PDF Full Text Request
Related items