Font Size: a A A

Research Of Plagiarism Detection Based On SVM

Posted on:2016-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:S H WangFull Text:PDF
GTID:2348330542475461Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology and the popularization of the Internet,network has penetrated into people's life,study and work.The network brings convenience for people,while it also provides an opportunity for plagiarism and other illegal behavior.How to prevent plagiarism behaviors has already become a research hotspot.The establishment of a rapid,accurate paper copy detection system has practical significance.Plagiarism detection is a mean of protection of intellectual property rights.From different point of view,plagiarism detection can be divided into two major problem classes,namely intrinsic plagiarism detection and external plagiarism detection.The task of intrinsic plagiarism detection is,without a reference collection,deciding whether a document contains plagiarized segments.Otherwise,the task of external plagiarism detection is,with a reference collection,deciding whether a document contains plagiarized passages and locating the source passages in the source documents.This paper focuses on the external plagiarism detection.The following are the specific contents.For external plagiarism detection,this paper proposes an external plagiarism detection method based on information retrieval and support vector machine(SVM),and its corresponding subtasks are candidate document retrieval and plagiarism analysis based on SVM.Firstly,information retrieval is used to candidate document retrieval.This paper uses TF-IDF as keyword extraction method and vector space model as computing similarity measure model.Secondly,for detailed analysis,this paper applies SVM detecting plagiarized passages.For the pair<suspicious document,candidate document >contains suspicious document and its candidate document.Feature is extracted and written into vector form.With these feature vectors,the support vector machine is trained.Finally,suspicious documents and candidate documents from test corpus are predicted with the SVM,thus SVM predicts whether suspicious document contains plagiarized passages.Finally,for training and testing SVM,this paper proposes a new feature combination.Experiments show that,TF-IDF for keyword extraction and vector space model for similarity measure model are applied for candidate document retrieval,candidate documents are high-qualified texts;for detailed analysis,support vector machine is applied.And a unified feature combination is proposed.The performances of detection system are improved.Results of this study show that,support vector machine are used for external plagiarism detection is feasible,and precision and recall are improved for certain degree.This provides certain reference for using machine learning algorithms for plagiarism detection.
Keywords/Search Tags:Plagiarism Detection, External Plagiarism Detection, Support Vector Machine, Feature Extraction
PDF Full Text Request
Related items