Font Size: a A A

Research On The Construction Technology Of Cross-language Plagiarism Detection Model Based On Multi-features

Posted on:2018-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:G X LiFull Text:PDF
GTID:2348330542490797Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet,information sharing is becoming more and more convenient,which will lead to the problem of plagiarism after another.The problem of monolingual plagiarism in the domestic development has been relatively mature,but the cross-language plagiarism is just emerging.Therefore,the study of cross-language plagiarism is an important issue that needs to be solved urgently in the field of anti-plagiarism and even the entire academic community.In this paper,a multi-feature-based cross-language plagiarism detection model is constructed,which aims at solving the problem of cross-language plagiarism according to the features extracted from the translation.This paper firstly analyzes and summarizes the research status of single and double language plagiarism,and proposes a multi-feature-based cross-language plagiarism detection model,the model includes multi-feature-selection-based cross-language plagiarism classification and multi-feature-correspondence–based cross-language plagiarism detection.For cross-language plagiarism classification,a multi-feature-selection-based cross-language plagiarism classification method is proposed.It mainly extracts the common features of translation according to the Europeanization of the translator in translation appears,after further feature selection and feature weight calculation,the classifier is trained to classify the cross language plagiarism.In this process,a new feature selection method is proposed.This method combines the traditional chi-square test method,and on this basis,we consider the number of features in the text and the stability of the characteristics in the category.For cross-language plagiarism detection,a multi-feature-correspondence–based cross-language plagiarism detection method is proposed.the results of plagiarism filtering two times is mainly based on the correspondence between translation features and structural features.The translation features correspondence is the correspondence between the selected features and its English expression.An algorithm for calculating the distance between the paragraphs is proposed to compare the corresponding Chinese and English paragraphs.The structural features of the correspondence is to compare the structure of the Chinese and English paragraphs,retain the structure of similar paragraphs,filter structure of the different paragraphs.Finally,this paper uses WordNet-based method to calculate the similarity of the test results,which finally achieves the purpose of cross-language detection.In this paper,the transcendental plagiarism model is established,and the results of the classification and the test results are verified by experimental comparison and experimental analysis.The validity and scientificity of the model are proved.
Keywords/Search Tags:feature selection, classifier, candidate set, cross-language plagiarism detection
PDF Full Text Request
Related items