Font Size: a A A

Research On Plagiarism-identification System For Chinese Documents

Posted on:2009-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y CaoFull Text:PDF
GTID:2178360272988283Subject:Information Science
Abstract/Summary:PDF Full Text Request
Plagiarism identification is one type of copy detection technology, it is a powerful measure to improve the quality of academic papers and encourage academic honesty. Plagiarism identification for documents is to judge whether the given document plagiarize contents of other documents in the database, which plagiarism occurs in some way, such as by copying total documents contents, duplicating most parts of documents contents, or partial.Firstly, this paper introduces the signification of plagiarism identification for Chinese documents and basic theories of the technology, analyses the functions and characteristics of current copy detection systems, tools, or websites for documents are given.Secondly, this paper summarizes the methods of Chinese automatic segmentation and several current segmentation systems, as the basis of plagiarism identification.Thirdly, this paper introduces and analyses all kinds of similarity methods, presents a new similarity method of many properties integrated. Use this new method finding similar documents. This system makes use of keywords similarity, classification similarity, title similarity, abstract similarity to judge relative documents; then restructures the document with notional words, calculates the similarity of text basing on the model of tokens, and determines similar documents.Fourthly, this paper presents the method of non-repeat longest common substring based on common substring, and the method of non-repeat longest common substring based on segmentation. This article uses these two methods to find out the common contents by comparing documents, then creating a similar report.Again, this paper describes the constitutions of all dictionaries, such as thesaurus, classification, stopword list, and so on. This paper solves the difficult problems of measuring similarity method and finding out common contents for Chinese documents. Finally, based on these researches, a prototype of the plagiarism identification system for Chinese documents is designed and implemented by object-oriented method. This system can find overlaps among documents. In the end, this paper explains how to select experimental data and how to certain the values of parameters, and evaluates the performance of the plagiarism identification system. The last chapter is the sum-up and expectation of the article, which including author's work, innovation of article, existent deficiencies of the system, and more improved measures.
Keywords/Search Tags:Chinese documents, plagiarism identification, similarity methods, similar illustration
PDF Full Text Request
Related items