Font Size: a A A

Research On Text Plagiarism Detection Methods

Posted on:2013-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:X L HuaFull Text:PDF
GTID:2248330371993542Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the accelerating development of network technology, people can have an easier access to information. On the one hand, it becomes convenient for people to obtain what they want, on the other hand it provides opportunities for some immoral deeds such as plagiarism, copy and illegal spreading. Text copy detection technology is on the way of becoming an important research subject in the natural language processing field.Internal plagiarism detection and external plagiarism detection are two types of detections within the research field, both of which have improved greatly in recent years, especially the latter. Nowadays, more and more researchers have drawn their attention to the study of internal plagiarism detection. On the basis of a deep analysis of the existing copy detection systems, this paper puts forward two means of resolution:1. We carry out the internal plagiarism detection research through learning and supervision With a deep analysis of the the PAN corpora,, we exploit several features reflecting text writing style and classify them into the four groups of character, lexical, pos and chunk features. Considering the inconsistency between the plagiaristic section and the the whole passage, we first need to extract features of various aspects from both the sections of the article and the whole, and then present them in forms of "style model",, Last, we use SVM classifier to differentiate the potentially plagiaristic parts of the article. Great efforts have been paid to evaluate the influence of each feature and their combinations on system performance during the process of research.2. We study on frequently used search models of external plagiarism detection system. Taking into account of the problem that existing statistical models can not effectively find out collections of alternative documents related to suspected plagiarism articles, we put forward a computing method used to calculate the similarity among texts on the basis of semantic analysis, and put it into the use of alternate articles’retrieving. Experiments tell us that the proposed method can effectively increase the collection of alternate articles.3. In the part of detailed analysis about the external plagiarism detection, we propose a localization method aimed at plagiaristic paragraphs based on stop words, which keep unchanged in the process of plagiarism. This method can validly capture the structured features of texts and find put similarities among them. Meanwhile, to locate the boundary of plagiarism passages, we should first merge adjacent strings have the most similarities. Then, we use the clustering method to reduce the impact of the obfuscated text on the locating.Experiments show that the above research can improve the performance of plagiarism detection, which is exactly the purpose of this paper...
Keywords/Search Tags:plagiarism detection, style characteristics, retrieval model, semanticanalysis, stop words information
PDF Full Text Request
Related items