Since the 21 st century,the digitalization process has been accelerating,and the amount of data has increased exponentially.Literature,as a carrier of scientific and technological research results,is an important medium for people to acquire knowledge,and it has made the spread of knowledge break through the limitations of time and space.The research of any subject must collect extensive literature,read a large amount of literature,and carry out more in-depth research on the basis of seniors.However,the growth rate of the literature follows the "exponential growth law.Manual reading of the literature seems a bit verbose and inefficient.This article uses the python crawler technology,the LDA topic model,and the co-occurrence network model to focus on the characteristics and characteristics of the various Chinese core journals,and the statistical field Hot topics and mainstream research methods perform automated information extraction and draw related knowledge network maps.This paper consists of three chapters.The first chapter introduces related theories and concepts,including python web crawler technology and LDA topic model principles.The second chapter first explains the source and caliber of the data,and then makes a comparative analysis of the emphasis and characteristics of each core Chinese statistical journal.It is found that from the perspective of citation coverage,the quality of the Journal of Statistics and Information Forum and Statistical Research is the highest;from the average number of downloads and average citations,the quality of the Journal of Statistical Research is the highest;from the perspective of keywords,Mathematical Statistics And Management "journals focus on theoretical and innovative research," Statistical Research "journals focus on macroeconomic statistics;" Statistics and Decision "journals focus on microeconomics,with a focus on the application of statistics in business management;" Statistics and Information Forum "focuses on theories and applications Combined.The third chapter constructs the knowledge network in the field of statistics.The LDA topic model is used to summarize the research topics in the field of statistics in the past ten years into 20 topics.Among them,13 are classified as content topics and 7 are classified as method topics.Then,according to the "content-method" knowledge network,the "authorcontent-method" knowledge network,and the "time-content-method" knowledge network,the implicit association groups are searched respectively.For example,"evaluation systemeconomic growth" is a popular "content-method" association group,and "Xu Dilongmonetary policy-evaluation system" is a popular "author-content-method" association group.In addition,in terms of changes in theme intensity,the research content of big data and ecological coordination has seen the most significant increase in theme intensity in the past five years.Among the method topics,the topic intensity of the sampling survey method decreased significantly,and the topic intensity of machine learning related methods increased significantly. |