Font Size: a A A

Research And Implement Of The Copy Detection System For Academic Papers Based On Semantic Structure

Posted on:2011-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:T T LiFull Text:PDF
GTID:2178360308961350Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the popularization and rapid development of the Internet, digital library and digital distributed media,the flood of information resources filled with our daily lives with a variety of forms.A wealth of digital resources provides people with convenient, while may form a copy plagiarism breeding ground.In recent years, the academic misconduct like fake academics and plagiarism are frequently reported.Therefore,the research on the prevention of academic plagiarism, academic fraud resistance based on the technique of an effective plagiarism detection system has important theoretical significance and practical values.The technology of copy detection based on text forms is an important measure in both intellectual property protection and improving information retrieval efficiency. Also,the copy detection system for academic papers based on semantic is a core and emphasis of copy detection.The basic theories of the technology of copy detection and the analysis of the functions and characteristics of current copy detection systems are introduced firstly. An algorithm for copy detection of multi-level and multi-strategy based on semantic is proposed to improve detection efficiency and accuracy. The main research work and achievements can be listed as follows:Through anglicizing current situation of academic plagiarism in our country, this thesis divides plagiarism into duplicate submission and normal copy. Two different recognition algorithms, which focusing on these two copies,were designed to improve detection efficiency and accuracy. Duplicate submission detection adopts finger printing method, and normal copy is identified with the method of word frequency.The ideas of the thesis are structured and hierarchical extraction of feature items.According to characteristics of papers and algorithm design considerations,the academic papers to be checked are broken down into three-level structure, veto level,judgment level, and identification level. Followed by three levels of recursive, each level in the identification process are to play their respective functions.In the text pre-process, a synonym table knowledge base is established, which adapts to the processing features of the Chinese text of natural language.And we take a "reconstruction" for the text to achieve synonym replacement, which detects the plagiarism phenomenon of synonym replacement based on the semantics of a certain level.In the copy detection process of normal copy, multi-level and multi-strategy ideas are proposed. The characteristic items that are defined in the identification level are really involved in the similarity calculation.According to the different locations in the paper as well as the contribution of similarity determine of the characteristic items, we set different weights for different items respectively, and thus improve the accuracy of calculating the similarity. In addition, taking into account the technical specifications of different research areas differ, we do not use single set in the threshold strategy, but based on the papers of different subjects classification dynamically set threshold, achieving a recognition algorithm of multi-strategy.Experimental results show that the algorithm is better than current copy detection systems in precision and recall.
Keywords/Search Tags:copy detection, plagiarism, semantic structure, multi-level decision
PDF Full Text Request
Related items