Font Size: a A A

Open Electronic Document Plagiarism Detection Services To Build Technology Research

Posted on:2009-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhangFull Text:PDF
GTID:2208360245476615Subject:Education Technology
Abstract/Summary:PDF Full Text Request
With the popular of Information Technology and the rapid development of Internet, people can easier and easier get all kinds of resource from Internet and can easily plagiarize the content of e-documents with a "cut and paste" approach. It can be said that the opened Internet platform make the people conveniently get all kinds of electronic documents resources and, at the same time, is a hotbed of plagiarizers who steal information. Under such circumstances, the opened e-documents plagiarism detection service system is in urgent need.This paper describes the present station, structure and characteristics of the opened e-documents plagiarism detection service system and conducts a detail study of the major technologies involved in the construction of the opened e-documents plagiarism detection service system, including candidate documents access technology and e-documents plagiarism detection technology.For candidate documents access technology, first this paper compares and analyzes the commonly used information access technologies i.e. web spider and meta search, then gives the implementation of candidate documents access technology base on meta search.In this paper, an anchor-based algorithm SCAD is also developed to detect plagiarism of large scale e-documents. This algorithm first splits a document into sentences after preprocessing, and then gets an anchor set by an already weighted key words set. According to the anchor set, sentences containing the anchors are chosen to generate fingerprints, and then the similarity of any two documents is computed using the fingerprints. The experiment result illustrates that this algorithm has high precision and separation while the fingerprints set is very small.At the same time, in order to meet better the needs of plagiarism detection services, this paper also proposed a one-to-one detection algorithm based on the suffix tree. This algorithm can identify common strings of tow documents, and then we can highlight these strings to provide direct evidence to users.Finally, the paper also describes a prototype system of opened plagiarism detection service for education papers, and gives the design and makes a vision for the next phase of improvements.
Keywords/Search Tags:plagiarism detection, fingerprinting, candidate document, suffix tree
PDF Full Text Request
Related items