Font Size: a A A

Design And Implementation Of Copyright Protection System Based On Text Similarity

Posted on:2020-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiuFull Text:PDF
GTID:2428330578452549Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet era,the problem of network intellectual copyright has attracted much attention,and the infringement in the Internet field has become increasingly prominent.In today's increasingly strong copyright awareness,it is particularly important to actively seek solutions for copyright protection of original content.On the one hand,designing an effective copyright protection system can help protect the safety of high-quality original content on the community platform;on the other hand,it can provide better product experience functions for excellent original users,highlight the concept that the community attaches importance to original protection,and promote the community stickiness and creative motivation of original users.This dissertation studies text similarity algorithm in text copyright protection from the aspect of text content and designs a method to calculate text similarity in the field of copyright protection.Text similarity algorithm according to certain strategy is used to compare the similar degree between two text,the text similarity algorithm research mainly has two directions:one is the semantic dictionary method,by building the semantic dictionary,the text of the best keyword matching and dictionary matching by calculation of similarity to represent text similarity,the other is expressed in vector text content,and by building a space vector model,computing the vector cosine of the Angle between two vectors namely similar values,thus get the text similarity.At present,the academia has made many contributions to copyright protection,but the context semantics of the text are often neglected when the specific text similarity is matched.In order to solve this problem,this paper adopts the method of combining Word2vec and LSTM to analyze the text similarity and improve the accuracy of the text similarity analysis.Based on LSTM algorithm,a similarity computing method based on eigenvalue and eigenvector is designed and implemented.Firstly,the corpus is pre-trained,including text preprocessing and feature engineering,and a word vector model is built based on Word2vec to prepare for the next phase of text similarity computing model.Secondly,the LSTM model of long and short term memory network is trained and predicted based on the corpus of content database,and the similarity between sentence pairs is detected.Finally,the copyright protection system was designed to provide online services to predict the contents of the articles newly published by users.The similarity of the articles was calculated by returning the similarity between sentences and adding and averaging.At the same time,the original contents were updated to the original library in real time and the inverted index library was updated in real time.Through a large number of comparative experiments,the mixed similarity calculation strategy based on Word2vec and LSTM adopted in this system is superior to HowNet and other calculation methods in terms of accuracy and other evaluation indexes.The performance evaluation proves that this system has good execution efficiency and plays a certain role in copyright protection.
Keywords/Search Tags:Text similarity calculation, LSTM, Word2vec, original Protection, copyright protection
PDF Full Text Request
Related items