With the rapid development of science and technology,more and more academic papers have been published,and some academic research results with rich potential knowledge value cannot receive widespread attention from the academic community in a timely manner.This kind of research results showed zero or low citation within a few years after publication,and then suddenly got a large number of citations in a certain year,known as “Sleeping Beauty”.Identifying and awakening sleeping beauties as early as possible is conducive to shortening the time lag of recognition of major scientific discoveries,promoting researchers to pay attention to such papers,helping them determine the research direction of science and technology frontier,and accelerating the development process of scientific research.Therefore,it has received extensive attention from the academic community.At present,most studies on the identification of sleeping beauties and the calculation of their sleeping time are based on long-term historical citation data,and these research methods have limitations in the early publication of papers.Therefore,this work proposes an early identification method of sleeping beauties and a model to predict the sleeping time to break through the limitations of existing research methods.First of all,this work proposes a method to identify sleeping beauties based on random forest algorithm,and uses three evaluation indicators,namely,accuracy,recall and F1-score,as well as two identification methods,namely,B-index and Bcp-index,to test and compare their recognition effects.Secondly,this work ranks the impact of influencing factors on the sleeping time,extracts key influencing factors as explanatory variables: the Prince,journal reputation,number of co-authors and author influence,and conducts correlation analysis on the relationship between each explanatory variable and sleeping time.Finally,the Sleeping Time Prediction Model(STPM)is constructed by using the above four explanatory variables,and its ability to predict the sleeping time of sleeping beauties is further tested.Through the above research,we found that:(1)The accuracy of the recognition method of sleeping beauties based on the random forest algorithm reached the highest 72% in the fifth year after the publication of papers,and can well identify the sleeping beauties with a high cumulative citation frequency.Even if it is the sleeping beauties with a low degree of sleep and a short sleeping time,the random forest algorithm can still identify them.(2)The above four explanatory variables are all negatively correlated with the sleeping time,and the journal reputation has the greatest impact on the sleeping time,which indicates that increasing the journal reputation can effectively shorten the sleeping time of sleeping beauties and accelerate the awakening of sleeping beauties.(3)At any training set size,the prediction results of the STPM model are optimal,and as the training set size increases,the relative error of the STPM model decreases,and the prediction results become closer to the true value.The innovations of work are as follows:(1)To overcome the dependence of existing recognition methods on long-term citation history,an early recognition method based on random forest algorithm is proposed,which provides a new research idea for the recognition and prediction of sleeping beauties.(2)To clarify the relationship between the Prince,journal reputation,number of co-authors and author influence on the sleeping time of sleeping beauties,this work constructs a duration prediction model based on early data,which can predict the sleeping time of sleeping beauties before them wake up.The recognition method of sleeping beauties and sleeping time prediction model proposed in this work overcome the dependence of existing research methods on long-term citation history,provide new research ideas and methods for the research of sleeping beauties,help shorten the recognition delay of innovative research with potential knowledge value,promote the diffusion of scientific knowledge,and accelerate the development process of scientific research. |