Font Size: a A A

Text Plagiarism Detection Algorithm On Digital Fingerprints

Posted on:2018-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:M M ZhaoFull Text:PDF
GTID:2348330512976942Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The digital fingerprint-based text copy detection algorithm has been widely applied in the fields of information retrieval,duplicated web pages deletion,copyright protection of library resources and software.Digital fingerprint detection algorithm,benefited from its advantages of small storage space and fast detection,is suitable for large scale text plagiarism detection system.The paper first introduces the basic principles and main flow of digital fingerprint detection algorithm,and then focuses on the text feature extraction and digital fingerprint extraction.To solve the problem of the large numbers of text features in computing the similarity between two files,the common methods of text chunk segmentation are researched.Based on the sentence-chunk and dependent relationship between words,an improved algorithm of text feature extraction is proposed,that can resolve the shortcoming of insufficient consideration in syntactic dependency and can effectively reduce the number of features.In addition,in order to reduce the digital fingerprint density and decrease the computational complexity,this paper improves the digital fingerprint feature extraction algorithm.On the basis of the sliding window mechanism of Winnowing,we can select the digital fingerprint according to the optimal decision model and the optimal constraint condition.Experiments show that the proposed text feature extraction algorithm can accurately select the text feature,and improve the detection accuracy rate.The improved digital fingerprint extraction algorithm reduces the digital fingerprinting density and storage space.
Keywords/Search Tags:plagiarism detection, digital fingerprint, feature extraction, optimal decision
PDF Full Text Request
Related items