| Since the end of the 19 th century,bibliometrics has experienced centuries of development.The research object of bibliometrics has been gradually refined from the initial literature to the information units in the literature and finally focused on the knowledge entities to meet the needs of more fine-grained knowledge services in different fields.As an important component of the AI era,algorithms are one of the knowledge entities that deserve the most attention from scientists today.Algorithms are important objects of scientific research.Proposing efficient,convenient,and low-cost algorithms is an important topic for scientists.Algorithms are also a core scientific research method.Learning and applying algorithms to solve various problems are essential skills for scientists.The development of algorithms has entered a period of rapid growth,and many new algorithms have emerged,which brings new challenges to scholars,especially beginners,in learning,understanding and choosing algorithms.If a large number of algorithms can be collected and analyzed for their academic influence,we can provide scholars with a collection of preliminary organized and filtered algorithms and evaluation results,and help them to find algorithms applicable to their research,thus improving the efficiency of researchers in solving problems.Academic papers are a platform for displaying knowledge achievements and an important carrier for knowledge dissemination,and they contain many algorithms and the full-text content features of the mentioned algorithms.Therefore,this article uses academic papers as the data source to analyze the academic influence of algorithms based on content mentioning the algorithm.Algorithms in academic papers are embodied in two forms: metadata and entities,the former being specific textual content introducing algorithms and the latter being specific nouns and phrases referring to algorithms,with this study focusing on the latter,i.e.,algorithm entities.Currently,evaluation methods for entity impact include qualitative and quantitative approaches.The qualitative method has high accuracy,but the input cost is high,and it is difficult to cope with evaluating a large number of entities.Quantitative methods are more operable,but the current traditional indicators based on frequency cannot explore deep-seated differences in influence.Therefore,to realize the academic influence evaluation of large-scale algorithm entities,this thesis adopts quantitative methods and constructs a finer-grained evaluation indicator with the help of text content features.Taking the algorithm entity in the field of natural language processing(NLP)as an example,this thesis extracts the algorithm entity in the full text of academic papers,and evaluates the academic influence of the algorithm entity by using the three characteristics of mention frequency,location,and motivation.After that,we explore the trends and reasons for the evolution of algorithm influence,and then build a retrieval system to show the mentioned content and the academic influence of each algorithm.For the above goals,this thesis specifically carried out the following five research tasks.(1)Extracting and identifying the algorithm entities.After sorting out the definitions of scientific objects such as methods,algorithms,and models in different stages of scientific development,this thesis proposes the definition of algorithms and algorithm entities for the NLP field to clarify the research object.We use traditional machine learning models,deep learning models,and pre-training models to conduct preliminary identification of algorithm entities in the full text of the paper,among which the BERT+Bi-LSTM+CRF model performs best.Rule filtering and manual review are utilized to screen candidate entities to ensure that subsequent analysis is carried out on accurate algorithm entities.In the end,we obtain 3,033 algorithm entities and 1,652 normalized standard names of algorithms from papers published from 1979 to 2020,and compile the algorithm dictionaries.(2)Analyzing the academic influence of algorithm entities based on mention frequency.This thesis uses articles and sentences as units to construct academic influence evaluation indicators based on mention frequency.Unlike the traditional method of only summing the number of articles,this thesis also considers the academic age of the algorithm entity and the total number of articles in the field each year to obtain the breadth influence,which is the influence in the field.Subsequently,we use the number of articles and sentences simultaneously and obtain deep influence,that is,the influence within the article.(3)Analyzing the academic influence of algorithm entities based on the location of mention.Considering that the importance of algorithm entities in different chapters is not the same,this thesis takes chapters as a unit and divides them into six logical categories automatically.Subsequently,we weigh these chapters and then evaluate the broad and deep influence separately by the frequency of mentions and the importance of chapters.The results show that after applying the mentioned position,the influence of algorithm entities with the same number of articles can be further distinguished,and the results are more in line with the current research work on the difference between algorithm entities.(4)Analyzing the academic influence of algorithm entities based on the motivation of mention.Since algorithms mentioned for different purposes have different influences on papers,this thesis uses sentences as units to automatically identify the motivation categories that mention each algorithm entity.After the different motivations are weighted,the algorithm’s broad and deep influence can be evaluated separately based on the frequency of mention and the importance of different motivations.The results of this thesis show that more emerging algorithm entities obtain higher academic influence after accounting for the mentioning motivation of the algorithm.(5)Evaluating and analyzing the comprehensive academic influence of algorithm entities.After obtaining three motivation,location,and frequency of mentions for each algorithm entity,this thesis synthesizes all the features to analyze the academic influence.The expert evaluations show that the results obtained from the composite indicators based on multiple characteristics are closer to the standard results from experts.The results indicate that the classical algorithms of different ages,the algorithm entities that have promoted the field’s development and shown subversive performance,have achieved extremely higher bread-academic influence.Emerging algorithms in recent years,niche algorithms that have shown good performance but have not yet accumulated enough spread,have gained extremely high deep-academic influence.Based on the influence of individual algorithm,this study classifies individual algorithms and domain development stages according to the influence evolution trend,and finds that the development of the NLP domain can be divided into three stages according to algorithm influence evolution,and the change of different algorithms within each stage is related to the change of social needs and the development of technology.Finally,this thesis uses the text content and features of mentioned algorithms,as well as the ranking of algorithm influence obtained by different indicators,to construct an retrieval platform for users to query the content of mentioned algorithm entities in papers and the academic influence in different dimensions. |