Font Size: a A A

Research And Implementation Of Document Copy Detection

Posted on:2013-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:X W LiaoFull Text:PDF
GTID:2268330392469040Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the information on the web is becomingmore and more plentiful. As the result, the way of people exchanging informationbecomes more convenient than before. However, due to the convenient copying of text,images, video, and other network resources, it is convenient for people to copy thenetwork resources,which causes too much information redundancy, so that reduces thesearch engine’s efficiency and increases the difficulty of information extraction. What’smore, in recent years some students and researchers plagiarize others’ scientificachievement through the Internet. Therefore, in order to improve efficiency ofinformation retrieval and protect the intellectual property rights, the document copydetection technology has become the one of research focus in the natural languageprocessing field and the research has greatly significant meaning.This paper does a detailed study the document copy detection method on the basisof previous studies. This thesis makes some improvement on the method of documentcopy detection based on sentence similarity calculation, as the result, the copy detectionefficiency and accuracy are improved greatly.Firstly, this paper introduces document copy detection background, significance,and development status at home and abroad. Some related technologies, and theadvantages and disadvantages of commonly used text copy detection algorithms aresimply described.Secondly, based on the traditional BSP copy detection algorithm, we proposedsentence similarity algorithm based on the ordered longest common keywords sequenceand local copy detection algorithm based on keywords distance, and designed invertedindex structure of word-sentence and sentence-document, which effectively improvingthe copy detection accuracy and efficiency.Thirdly, based on the text copy detection method proposed in this paper, wedesigned and implemented a text copy detection system. The main function of thesystem includes: document registration, documents retrieve, synonyms maintenance,local copy detection, distributed copy detection, online copy detection, network setting,system setting, and document library management, etc.Finally, the experimental results show the practicality and effectiveness of thedocument copy detection method, and proposed by this thesis.
Keywords/Search Tags:text copy detection, online copy detection, keyword extraction, similaritycalculation, inverted index
PDF Full Text Request
Related items