Font Size: a A A

Theory And Method On Academic Literature Semantic Similarity Detection

Posted on:2015-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:X D WangFull Text:PDF
GTID:2428330491955275Subject:Information Science
Abstract/Summary:PDF Full Text Request
We are in an era of information.Despite the convenience of information acquiring,plagiarism has become easier either.In recent years several academic misconducts have caught the attention of both the academic community and departments concerned which makes similarity detection a hot research point.Main detection technics includes VSM-based approaches,digital figureprinting-based approaches and inner plagiarism detection.These methods are designed for detection of complete copy or partial copy which makes it hard to detect synonym replacement,sentence rewording and restructuring.Aiming at solving the problem this thesis proposed a semantic information-based academic literature similarity detection method.This thesis is organized as follows.(1)Writing feature of academic literature and plagiarism behaviors are studied.(2)A Chinese word frequency based digital fingerprinting algorithm is proposed to pre-select suspect papers.That it preserves as much information as possible even the paper has been modified in a certain amount makes the fingerprinting very robust.(3)An academic literature semantic similarity detection method is proposed based on semantic role labeling(SRL).First a paper is labeled using a SRL tool.Then a paper is detected differently on and the main body.Sentence granularity is used.All sentences are processed by a semantic role labeler and hypernyms were extracted using a semantic dictionary.Every paper is represented by a sentence-term-semantic role-hypernym 4-partite graph.Sentence comparison in different abstracts are confined in the same move.Move labels are given using a Support Vector Machine based on word,figure and acronym features.Sentence comparison in the main body refers to the 4-partite graph.The final detection result consists of abstract similarity and main content similarity.Experiment shows that for a paper with edit rate below 50%the fingerprinting proposed can make right detection on a probability of 79.38%(deleting)and 83.93%(adding).Shallow semantic labeling methods proposed achieves 95%on ten-fold cross validation experiment.Due to the confinement of SRL tools,the result of semantic similarity detection is not agreeable.Even so it is still 13%higher than other methods.The fingerprinting proposed can faithfully keep the edit rate of a paper.Move information is used in order to detect similarity more effectively.Semantic similarity detection is confined in the same semantic role which avoids unnecessary comparison.
Keywords/Search Tags:Plagiarism Detection, Academic Literature, SRL, Digital Fingerprinting, SVM
PDF Full Text Request
Related items