| For the past few years,along with the fast development of science and technology,major breakthroughs have been made in many areas of scientific research.Scholars have written their research results into papers,It’s helpful to their subsequent scholars to get theoretical support and technical guarantees.The citations of papers show the influence of a scholar in their research field.Predicting the citations of scholars’ papers can not only help researchers quickly identify influential scholars in the field,but also help scientific research management departments and funding agencies understand the subject Development trends,determine funding areas and topics,and better allocate resources.At the same time,the advent of the Internet era has made the electronicization of papers more common,which also allows us to obtain the citations of academic papers and historical papers published by scholars in recent years by crawling to conduct research on the citation forecast of scholars.At present,domestic and foreign researches on the Reference volume prediction of academic papers are mainly divided into statistical analysis methods,machine learning methods and graph model-based methods.Famous scholars usually have co-authoring relationships,so the co-authoring and citation relationship of scholar’s papers is very helpful for predicting the citation volume of scholar’s papers.However,Methods based on statistics and machine learning methods cannot make full use of the co-authoring and citation relationships of scholars’ papers,but simply treat each scholar as an isolated individual;The method based on the graph model only uses the relationship diagram of the paper,and does not combine the natural language processing technology to extract the feature of the text content of the paper,which leads to the failure to fully show the scholar’s research field and research content,However,this is an important feature to predict the number of citations by scholars,because the number of citations by scholars active in the hot research fields in recent years is usually higher.The graph neural network that has emerged in recent years is an effective algorithm for describing graph relationships.After constructing the adjacency matrix,the graph neural network can use the adjacency matrix to propagate the features between nodes,thereby completing the semi-supervised learning on the graph.On the other hand,since the title of the paper belongs to the text,natural language processing related technologies are needed to extract text features.In recent years,pretraining models including BERT,ELMo,GPT series and XLNet have made breakthrough progress in natural language processing related tasks.Among them,XLNet as an autoregressive language model overcomes the shortcomings of the autoencoding model and solves In order to overcome the inability of other self-encoding models to obtain context,it has achieved very good results in multiple tasks in the natural language field.In this paper,we try to use XLNet to extract the feature of the title of the paper,and to splice the historical information of scholars as the characteristics of the scholars for graph neural network training.This paper has fully studied the related work of domestic and foreign scholars’ paper citation prediction,and the current deficiencies of the research have been analyzed and summarized.Aiming at the characteristics of the academic paper citation task,this paper has proposed the scholar paper citation prediction algorithm XLNet_GAT based on the pretraining model XLNet and the graph attention network GAT,the improved Word Char_XLNet_GAT algorithm based on word segmentation and multi-feature fusion of word segmentation,and self-attention Improved Self_Att_XLNet_GAT algorithm of force mechanism,WC_Att_XLNet_GAT improved algorithm based on word segmentation multifeature fusion and self-attention mechanism,respectively.The main work of this paper contains the following three points:(1)The composition of Baidu’s academic paper summary page has been analyzed,by employing crawlers to grab the summary of Chinese papers in the field of artificial intelligence for the past five years,including the author list,the title,and the citation status of the paper.Based on the above results,the experimental corpus for this paper research has been obtained.(2)The shortcomings of the existing methods for predicting the citations of scholars at home and abroad have been analyzed.In addition,the citation prediction algorithm XLNet_GAT has been and proposed,which combines the pre-training model XLNet and the graph attention mechanism GAT.This method constructs a directed graph as an adjacency matrix through the co-authoring and citation of the paper and combines XLNet to extract the text feature of the paper title.Experiments show that the RMSE of the XLNet_GAT algorithm on the test set is about 10.8% lower than that of the XLNet_Bi LSTM algorithm,and the R2_Score is increased13%.(3)Based on XLNet_GAT,and combining the XLNet features of word segmentation and word segmentation,this paper proposes an improved Word_Char_XLNet_GAT algorithm based on the fusion of multi-features of word segmentation and word segmentation.Simultaneously,the self-attention mechanism has been adopted to fuse the XLNet features of multiple paper titles by the same scholar,and an improved Self_Att_XLNet_GAT algorithm based on the self-attention mechanism has been proposed.Combining the advantages of the improved Word_Char_XLNet_GAT algorithm and the improved Self_Att_XLNet_GAT algorithm,this paper proposes an improved WC_Att_XLNet_GAT algorithm based on word segmentation multi-feature fusion and selfattention mechanism.Finally,the effectiveness of the improved three algorithms has been proved by an ablation experiment. |