Font Size: a A A

Research On Paper Similarity Based On Semantic Understanding

Posted on:2012-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:L Z TangFull Text:PDF
GTID:2218330338471589Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In the current society, information technology, computer technology and linguistics (the rise and development of statistical linguistics and corpus linguistics) have been developing rapidly, which have laid a solid foundation for similarity researches. At the same time, a small number of academic misconducts have been found in universities, mainly including plagiarizing, misappropriating others'academic achievements, which have exerted a baneful influence on the reputation of the entire academic community and the academic quality of universities resulted in a very bad effect. To some degree, with various of recent paper detection have kept down some academic misconducts, however, paper detection is deficient in detecting grammar, icon, and formula owing to the attaching importance to number repetition. Therefore, we are in urgent need of paper detection system to improve the quality of papers, and to prevent misconducts. With combining the insufficiency of the detecting system and its current study achievements, this paper studies similarity in paper detection in the hope of promoting a breakthrough in application of similarity researches.This thesis firstly gives a brief introduction to text similarity and semantic similarity for a brief introduction, primarily analyzing the principle of similarity calculation, and taking into account the factors and the specific similarity algorithm. Through the analysis of its algorithm and consideration of the advantages and disadvantages of various algorithms, we finally choose the How-Net based on word similarity algorithms. Consequently, the original algorithm is improved in the following aspects: the semantic density factor is introduced to the calculation of word similarity; and this algorithm is extended to the calculation of sentences, paragraphs, or even papers. By analyzing the similarities of words, sentences, paragraphs, and papers, we apply semantic understanding to paper detection. And, according to the similarity algorithm involved in this thesis, the effectiveness of the algorithm is verified through a series of experiments. The similarity algorithm is applied valuably to paper detection, its main refers to the words similarity, sentence similarity and paragraphs similarity applications, in a certain extent, realizing the true meanings of detecting papers.
Keywords/Search Tags:text similarity, semantic similarity, How-Net, paper detection
PDF Full Text Request
Related items