Font Size: a A A

Research On CNN-based News Headline Similarity Calculation Model

Posted on:2021-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:T T WuFull Text:PDF
GTID:2518306125464734Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet and smart phones,people's demand for news information has become more personalized.The traditional recommendation method based on keyword matching in news headlines has not been semantically analyzed,so the recommendation of news may not be of interest to users.This article attempts to analyze from the perspective of text similarity.Due to the length of the news text,there are usually many contents that are not related to the analysis results,which will affect the judgment accuracy.Moreover,if the similarity calculation is performed on the news content,the formed word vector matrix is too large,which results in too large a calculation amount,which is difficult to realize and does not meet the requirements of the production environment.Therefore,this article uses the BERT pre-training model to represent the semantic features of news headlines to calculate the similarity of news headlines and make recommendations.Therefore,this article uses the BERT pre-training model to represent the semantic features of news headlines to calculate the similarity of news headlines and make recommendations.The traditional text similarity calculation method has some shortcomings such as large human resource consumption,inaccurate text information extraction,high requirement on dictionary.And the LSTM-based similarity calculation model is suitable for processing the long sequence problem.For relatively short text information such as news headlines,it is not possible to take full advantage.Therefore,this paper proposes a CNN-based news headline similarity calculation model.In order to improve the accuracy of the calculation of the similarity of news headlines,this article has improved from three aspects: feature representation,feature extraction and model training.The specific main work is as follows:(1)Feature representation: Aiming at the problem that the traditional word vector model cannot pay attention to the context information and cannot deal with the polysemy,this paper proposes to build a news headline similarity calculation model by combining the BERT pre-training model.(2)Feature extraction: In the process of extracting news headline features using convolutional neural networks,the key features of each headline and the interactive feature information between the headlines are ignored,which leads to the problem of incomplete feature extraction.This article focuses on CNN-based news headlines Based on the similarity calculation model,self-attention and two-way attention mechanisms are introduced and verified experimentally.(3)Model training: In view of the problems of gradient disappearance,neuron death,mean shift,slow convergence speed,and weak sparse expression ability during the training process,the characteristics of commonly used activation functions are analyzed,and a new activation function SPReLU is constructed and calculated in the similarity Experiment verification in the model.In order to verify the effectiveness of the similarity calculation model of the news headline constructed in this paper,this paper conducted an experimental test on different data sets.The results show that the similarity calculation method using the BERT pre-trained word vector model is relatively good.At the same time,the accuracy and F1-value of the similarity calculation model with multi-attention mechanism have been improved to a certain extent.Finally,using the similarity calculation model of the SPReLU activation function,the convergence speed and accuracy of the model are improved to a certain extent,and the performance is also improved to a certain extent,which proves the effectiveness and feasibility of the scheme in this paper.
Keywords/Search Tags:Similarity calculation, Convolutional Neural Network, BERT, Attention mechanism, Activation function
PDF Full Text Request
Related items