Font Size: a A A

Research On Citation Count Prediction Of Papers Based On Deep Learning

Posted on:2023-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:A Q MaFull Text:PDF
GTID:2568306827475454Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the explosive growth of academic papers,predicting the citation count of academic papers can help scholars recognize the more influential papers in advance,which has certain practical application value,so how to develop an efficient citation count prediction mode has become a hot issue in academia.A variety of bibliometric features and altmetric features has been used in citation count prediction task.Furthermore,the semantic information contained in the metadata text such as title and abstract of an academic paper has effect on citation count.However,existing studies about citation count prediction ignore the contextual semantic information in the metadata text.In view of the above shortcoming and the problem that the common methods used in existing studies are not suitable for extracting semantic features,this paper proposes a novel citation count prediction model BILA based on Bi-directional Long Short-Term Memory(Bi-LSTM)and attention mechanism.BILA first applies Doc2 Vec algorithm to vectorize the sentences in metadata text to obtain the sentence vector matrix and input it into Bi-LSTM,then the attention mechanism is used to further extract deep semantic features of metadata text from the hidden state matrix generated by Bi-LSTM,and finally metadata semantic features and early citations are fused for the long-term citation count prediction.On the basis of the previously proposed method verifying the effectiveness of semantic features of metadata text,for solving the problem that Bi-LSTM cannot achieve parallel computing and further improving the citation prediction performance of the model,this paper builds a novel citation count prediction model HTN based on hierarchical Transformer networks,which improves the computational efficiency on the long text data.HTN first calculates the sentence-level semantic representation based on word embedding matrix and word positional encoding matrix,then the sentence contextual semantic representation matrix and sentence positional encoding matrix are used to generate the paragraph contextual semantic representation,and finally metadata semantic features and early citations are fused for the long-term citation count prediction.This paper collects the top-tier journal papers in artificial intelligence field to construct the citation count prediction dataset,and carries out a series of experiments on the dataset.The experimental results verify the effectiveness and feasibility of BILA model on the long-term citation count prediction task,and the model also has a good effect on the citation count prediction for the highly-cited papers,and metadata semantic features contribute to improving the prediction accuracy of our model.Furthermore,HTN model has better prediction performance compared with other models,and further strengthens the contribution of metadata semantic features to the citation count prediction performance.
Keywords/Search Tags:Citation count prediction, Semantic features, Bi-LSTM, Transformer, Deep learning
PDF Full Text Request
Related items