Font Size: a A A

Network Text Plagiarism Detection System Based On Multi-granularity

Posted on:2018-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:R S MaFull Text:PDF
GTID:2348330515451700Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of infomation age,the way people getting infomation has changed from reading newspapers and magazines to gaining access to the blogs and forums and so on.People can get the latest news anytime,anywhere.People have more and more requirements for high-quality information.Therefore,many news sites and well-known bloggers come into being.An excellent article will not only takes great quantity of reading,but also earns considerable profits.At the same time,the problems of copyright infringement occurs frequently because the developments in technology make it more convenient for searching and the cost of plagiarism is greatly reduced.This will not only violate the original author's benefit,but also have an extremely negative effect on innovation atmosphrere of China as a whole.Based on the above background,this thesis designs and implements a network text plagiarism detection system based on multi-granularity.The main functionality of the system is detecting whether there is a plagiarism on the network.Users can submit a text or URL,then the system generates a similarity report and a chart of similar text source based on the Network.The system's users are divided into individuals and enterprises,individual users use the system to protect the original author's rights by providing the evidence of plagiarising of the original text.Enterprise users use the system to detect the existence of corporate plagiarism of other site articles,to avoid the legal dispute.The main contents of this thesis can be listed as follows:1)A new algorithm of texts similarity based on Tongyici Cilin that improves the traditional cosine algorithm is proposed and implemented.This thesis proves the feasibility of the algorithm on the view of the theoretical and experimental results.2)Proposing a new approach to detect text similarity based on multi-granularity.And the multi-granularity is reflected on both aspects of the network crawler and text similarity calculation.The specific representations are as follows:In the network crawler,system chooses website to crawl text according to the granularity of text type.For example,for the technical text,the system will crawl text from CSDN,Sina blog and etc.While for news,system will choose news websites to crawl text;In the text similarity calculation,system chooses text similarity algorithm according to the granularity of users' requirements.For example,the cosine algorithm will be used under the circumstance of “Rapid detection”.If the “Regular detection” is choosed by users,system will use improved cosine algorithm.In the case of“Detailed detection”,system will adopt text similarity algorithm based on semantic understanding.3)According to the performance of the text similarity algorithm that used in the system of text clustering result detection,the three parameters of P,R and F are compared with the traditional cosine algorithm to ensure that the algorithm used in the system can meet the needs of the users' similarity detection.4)Evaluating the system in functionality and performance,analyzing the advantages and disadvantages,and puting forward ideas to the subsequent improvement.
Keywords/Search Tags:multi-granularity, text similarity algorithm, improved cosine algorithm
PDF Full Text Request
Related items