| With the rapid development of scientific research,many academic papers emerge globally every year.If we can accurately predict the number of citations of papers published in a short period of time,we can help scholars reduce their own search costs when facing a large amount of literature,and find more influential papers earlier.Therefore,it is important to construct an effective citation prediction model for newly published papers for practical research and application.Various bibliometric features and statistical methods have been used in citation prediction tasks.In addition,citation network characteristics and textual information of academic papers are influential on citation counts.However,existing citation count prediction studies have failed to utilize and effectively combine them.To address these shortcomings,this paper proposes a new published paper citation count prediction model GARU based on Graph Attention Network(GAT),which extracts semantic features and spatial structure features of papers by using the text content and citation network of relevant papers in this field,and then feeds them into the Graph Attention Network GAT model,which The GAT model uses the attention mechanism to assign weights to different papers based on the semantic similarity between papers,and then captures the interrelationships between papers,and outputs the feature matrix,while GRU is used to extract the distribution patterns of citation counts between papers,and the attention mechanism is used to fuse the two to effectively capture various information of papers.The model can extract semantic and spatial features using only the metadata features of newly published papers,such as abstract,title and references,to predict the citation counts 5 years after publication.In this thesis,the experimental results are validated on AI domain datasets on the Aminer and Web of Sciences database platforms,and show that the root mean square error(RMSE)and the mean absolute error(MAE)of predicting the number of citations in the next five years without any citation years available are 16.05 and 6.26,respectively,with the help of the metadata features of the text itself The metadata features of the text itself can still identify 54%of the highly cited papers,confirming its high accuracy in the task of predicting citation counts and identifying highly cited papers,and further research found that the prediction effect was improved while the number of years of citation counts available for newly published papers increased.With the number of citations available in the previous three years,the method showed an average decrease of 1.38 in root mean square error(RMSE)and 0.59 in mean absolute error(MAE)in predicting the number of citations in the next five years compared with the best performing of the four baseline methods,and an improvement of 16.67% in the identification of the task of identifying highly cited papers compared with the best performing baseline model.Some shortcomings were also identified during the experiments,and the model is intended to be optimized and improved in the following directions in the future.The number of citations in a paper is the result of many factors,such as the institutions and researchers involved,the place of publication,and many other such factors that are not included in the text in the abstract.Due to these systematic factors,improving citation count prediction results will require future integration of more data on these academic networks. |