Font Size: a A A

Research On Cross-Language Plagiarism Detection Technology Based On The Fingerprint Fusion

Posted on:2017-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:Q R YangFull Text:PDF
GTID:2348330518470935Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years, the phenomenon of academic misconduct has risen frequently and has attracted more attention from society. And plagiarism is the most common in the academic misconduct. Plagiarism can be classified into monolingual plagiarism and cross-lingual plagiarism based on different languages. The technology of monolingual plagiarism detection is mature, while cross-language plagiarism detection is still in the start stage. Cross-language plagiarism detection is complex because of language homogeneity and the different sentence structure.Based on the summary and analysis of current researches on the monolingual plagiarism detection and cross-language plagiarism detection, aiming at the existing problem of cross-language plagiarism detection, this thesis proposes a cross-language plagiarism detection technology based on fingerprint fusion. The cross-language detection technology can be divided into two parts. The first part is cross-language text similarity searching, and the second part is the result verification of plagiarism detection. First, on the basis of detailed analysis and research the tree structure of WordNet terms,the intermediate fingerprinting algorithm is given which is language independent. This algorithm gets over the language barrier and establishes a intermediate layer of language-independent. And then, a pre-processing is conduced on documents and key words are extracted. To solve the problem of polysemy, the semantic disambiguation algorithm is given which is based on the intermediate fingerprinting. After selecting fingerprint based on the frequency, fingerprints are formed for each documents. Then, we use Dice coefficient for cross-language text similarity calculation. The plagiarism candidate documents are formed after the similarity searching.The part is more efficient because it is based on bit operation. At the same time, the thesis analyzed the advantage and disadvantage of the SimHash algorithm and the Winnowing algorithm in detail, and proposed the Sim Win algorithm which put the two algorithm together for the result verification of plagiarism detection. The last procedure is to merge the plagiarism segment and form the plagiarism results. The second part improved the accuracy of detection.In the end, in order to evaluate the method that the thesis put forward, experimental verification was conduced on the plagiarism set of artificial building. Through analyzing and comparing the result of experiments, it can be concluded that the method is indeed effective.
Keywords/Search Tags:intermediate fingerprint, fingerprint fusion, semantic disambiguation, cross-language plagiarism detection
PDF Full Text Request
Related items