Research On Plagiarism-identification System For Chinese Documents

Posted on:2009-03-14

Degree:Master

Type:Thesis

Country:China

Candidate:Y Cao

Full Text:PDF

GTID:2178360272988283

Subject:Information Science

Abstract/Summary:

PDF Full Text Request

Plagiarism identification is one type of copy detection technology, it is a powerful measure to improve the quality of academic papers and encourage academic honesty. Plagiarism identification for documents is to judge whether the given document plagiarize contents of other documents in the database, which plagiarism occurs in some way, such as by copying total documents contents, duplicating most parts of documents contents, or partial.Firstly, this paper introduces the signification of plagiarism identification for Chinese documents and basic theories of the technology, analyses the functions and characteristics of current copy detection systems, tools, or websites for documents are given.Secondly, this paper summarizes the methods of Chinese automatic segmentation and several current segmentation systems, as the basis of plagiarism identification.Thirdly, this paper introduces and analyses all kinds of similarity methods, presents a new similarity method of many properties integrated. Use this new method finding similar documents. This system makes use of keywords similarity, classification similarity, title similarity, abstract similarity to judge relative documents; then restructures the document with notional words, calculates the similarity of text basing on the model of tokens, and determines similar documents.Fourthly, this paper presents the method of non-repeat longest common substring based on common substring, and the method of non-repeat longest common substring based on segmentation. This article uses these two methods to find out the common contents by comparing documents, then creating a similar report.Again, this paper describes the constitutions of all dictionaries, such as thesaurus, classification, stopword list, and so on. This paper solves the difficult problems of measuring similarity method and finding out common contents for Chinese documents. Finally, based on these researches, a prototype of the plagiarism identification system for Chinese documents is designed and implemented by object-oriented method. This system can find overlaps among documents. In the end, this paper explains how to select experimental data and how to certain the values of parameters, and evaluates the performance of the plagiarism identification system. The last chapter is the sum-up and expectation of the article, which including author's work, innovation of article, existent deficiencies of the system, and more improved measures.

Keywords/Search Tags:

Chinese documents, plagiarism identification, similarity methods, similar illustration

PDF Full Text Request

Related items

1	The Study And Realization Of Paper Plagiarism Identification System Based On The Text Structure
2	Chinese Document Content Similarity Detection Methods Research
3	Research On Key Issues Of Copy Detection Between Documents
4	Chinese Phrase Similarity Algorithm And Their Applications
5	A Study On The Illustration And Format Of Chinese Text Books For Primary Schools
6	Research On The Copy Detection System For Documents Based On String Matching Method
7	Research On Code Similarity Detecting Based On CNN And Code Plagiarism Checking System
8	Chinese Text Plagiarism Detection Algorithm Based On The Double Feature Extraction
9	Research And Implementation Of Code Plagiarism Detection Based On Subtree Tracking
10	Study On The Application Of Illustration To Website Design