Font Size: a A A

The Design And Implementation Of Verification System Based On Text Replication Detection Technology

Posted on:2020-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:C Y HuangFull Text:PDF
GTID:2428330596976878Subject:Engineering
Abstract/Summary:PDF Full Text Request
At present,the application of the internet is more and more extensive,which provides our work,study,entertainment,communication with diversification and convenient way.As computer office become increasingly popularity in all walks of life,most documents contain many similar information.For our country's special industry,the confidentiality of documents are under threat,so how to manage documents and find similar content by using archived documents is the focus of research in this system.Document copy detection technology is mainly used for detect similarities and similar content in texts.This technology has been developed from 1990 s to now,we can use many ways to realize the detection of Chinese content.In this paper,sensitive documents of special industries as databank,which are oriented to the data environment of a large number of e-mail attachments,file transmission attachments and instant messaging attachments on the internet.can accurately and quickly detect similar relationships in sensitive documents,and achieve sensitive content's illegal leaks verification and comparison.Finally,we can provide important clues and basis of document appraisal for customer unit.In this paper,we mainly systematic analysis the resource management and verification of sensitive content.According to customer units' data analysis and business analysis,the sensitive document data management subsystem and verification and comparison subsystem are designed and implemented.Putting the system into use can solve a series of problems,such as low efficiency of manual treatment,high error rate,and improve the accuracy of document detection.The system contributes to the protection of national sensitive information and the maintenance of social stability for our country.The sensitive document data management subsystem supports the functions of input,verification,maintenance,download and preview;the verification subsystem supports data files' online import and offline upload from different business platforms,the offline upload of customer documents,displaying the information with high similarity to those documents in the database subsystem,and providing standardized document review and appraisal process.The system obtains the pure text content by extracting and preprocessing from documents,and then carries out noise reduction,deactivation and segmentation,Finally,comparing the similarity of paragraphs and sentences with the samples in the database.The system calculates the similarity between sentences by using the method of sentence multi-feature similarity calculation,and calculates the paragraph by using the method of editing distance similarity calculation.Finally,sentence similarity and paragraph similarity are combined to calculate the comprehensive similarity of the two articles.The verification system based on text replication detection technology has been formally put into use on the customer site,and has passed a three-month trial period.The system runs well as can provide strong technical support for data management and document verification for customer units.
Keywords/Search Tags:Sentence Multi-feature, Editing Distance, Copy Detection, Verification
PDF Full Text Request
Related items