Font Size: a A A

Study On Chinese Text Replication Detection Based On Sentence Similarity

Posted on:2016-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhangFull Text:PDF
GTID:2298330467492123Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popularity of the network and computers, electronic documents has become a widely used form of information storage. Electronic documents are convenient for sharing and storage, and provide a great convenience to the dissemination of knowledge. But this feature has led to text duplication and plagiarism, so that document copy detection technologies have emerged. Document copy detection technology detects copy and plagiarism in texts. It is an important subject in natural language processing, and it can be applied in digital library systems, search systems, paper submission system, and many other areas.After research of the word similarity computing based on HowNet, we proposes an improved words similarity approach. This approach converts the word similarity to semantic similarity by KDML language of HowNet, and integrates the commonness and difference of word meaning, so that the word similarity calculation results are more reasonable. In addition, we proposes an improved text similarity calculation method based on the structure and order of words. This method considers the words semantic characteristics, the local structure of text and order of words in the text. The method extracts more features from text, so that the text similarity results are more accurate.By the improved approach, we achieved a copy detection system based on B/S structure, using SSH technology framework. Copy detection system contains text pre-processing module, text detection module, the results display module and sample library module. Finally, we verify the effectiveness of the new method with experiments.
Keywords/Search Tags:Text copy detection, word, word order, similarity
PDF Full Text Request
Related items