| Background: Health services management is a discipline that studies the law of health service development.The evaluation of academic papers is not only an essential part of health information management,but also affects all aspects of health management such as health policy,health plan management,health organization management and health human resource management.In the Central Committee of the Communist Party of China’s Proposal on Formulating the Fourteenth Five-Year Plan for National Economic and Social Development and the Long-term Goals for 2035,"persistenting in innovation" is listed as the top of important areas of work in the next five years.How to evaluate the innovation degree of scientific and technological achievements has become an important task.As the primary carrier of scientific and technical achievements,the evaluation method of academic papers has always been concerned.However,the existing evaluation methods mostly focus on external indicators such as citation frequency and impact factor,and pay less attention to the content itself as the main body of the paper,and there are few evaluations of the novelty of the paper.Based on this,from the perspective of the content of a single paper,this study analyzes the relationship between the content of a single paper and the distribution of topics in the subject field,and put forward a novelty evaluation model for evaluating the content of a single paper based on the Me SH terms network of the journal where the paper is published.This study can perfect the evaluation system of scientific papers,promote scientific and technological innovation,and provid a reference for scientific and technological research funds and resource allocation,etc.Objective: To explore the feasibility of evaluating the novelty of a single paper by using the attribute features of the Me SH terms contained in a single paper in the Me SH terms cooccurrence network in this field.Contents: The hypothesis of this research is that if a paper has a high degree of novelty,the Me SH terms of the paper and their combination should have different network characteristics from ordinary papers in the Me SH terms co-occurrence network of their field.Therefore,this study firstly explores the method of constructing the Me SH terms cooccurrence network and analyzes the basic properties of the Me SH terms co-occurrence network;Secondly,the attribute characteristics of the co-occurrence network of Me SH terms in the field of different novelty papers are analyzed,and the important network attributes related to the novelty of papers are found;Finally,use the machine learning algorithm to construct and evaluate the novelty evaluation model of the paper according to the network attributes related to the novelty of the paper.Subjects and Methods: In this study,papers from top journals in different fields and their Me SH terms represent the research status in this field.CELL,New England Journal of Medicine(NEJM),Lancet,and Journal of the American Medical Association(JAMA)are representative top journal in the field of biology and medicine in the world.CELL is a top journal in the field of life sciences,and the other three are top comprehensive medical journals.This study takes these four journals and their Me SH termsm co-occurrence networks as research objects.F1000 was used as a tool to evaluate the novelty of papers.It is currently the world’s largest recommendation platform of papers by biomedical experts,and is widely recognized in the field.It can be considered that the papers recommended by F1000 are of high quality.The papers recommended by F1000 are also marked with labels such as "interesting hypothesis","new finding","confirmation" and other labels,which reflecting the degree of novelty of the paper from different angles.It can be considered that the novelty of papers recommended by F1000 experts is relatively higher than that of papers not recommended by the same journals in the same period.In this study,the F1000system’s evaluation for papers in the above four journals was used as the criterion for the novelty of papers.1.Construction and attribute analysis of co-occurrence network of journal Me SH termsDownload the top journal documents in different fields in specific years from the literature database,the Me SH terms of each journal were extracted,and the co-occurrence networks of Me SH terms were constructed separately.Analyze the Overall attributes and Local attributes of each journal.2.Comparison of the difference of network attributes among papers of different noveltyCheck whether each journal paper in the sample set is an F1000 recommended paper and mark it,extracte the sub-network of the co-occurrence of each F1000 paper and the non-F1000 paper from the co-occurrence network of each journal’s Me SH terms.The network attributes and content difference indexes of the Me SH terms co-occurrence subnets of single papers with different novelty were compared to find the differences of the two types of papers in the Me SH terms co-occurrence network in the field.3.Novelty evaluation model of papers based on network attributesUsing six machine learning algorithms(Naive Bayes,Random Forest,Logistic Regression,Support Vector Machines,Neural Networks,Decision Trees)and the difference index of the Me SH terms co-occurrence network of individual paper with different novelties,the classification models of papers with different degrees of novelty were constructed,and the effect of each model was evaluated.Result: 1.Co-occurrence network attributed of Me SH terms in the fieldOverall attributes: The graph density of the networks of the four journals was similar,but each had its characteristics.The number of nodes,the number of edges,the connected components,and the mean distance of the CELL were the smallest,while the average degree and the average clustering coefficient were the largest;NEJM had the largest node number,edge number,mean distance,the smallest average degree,network diameter,and the highest modularization result.Lancet had the largest connected component and network diameter.Local attributes:(1)The distributions of degree of the Me SH terms co-occurrence network of the four journals showed a long tail distribution and obey the power-law distribution;(2)The frequency distributions of closeness degree were right skewness.(3)The frequency distributions of betweenness degree all showed a long-tail ed trend.2.Differences in network attributed of individual papers with different degrees of noveltyThe network attributes with significant(P < 0.05)differences were as follows:(1)28features were screened out by CELL and 13 were excluded;(2)JAMA screened 10 features and eliminated 31;(3)A total of 15 features were screened out by Lancet,and 26 were excluded;(4)NEJM screened 35 features and eliminated 6.It was found that degree,closeness degree,harmonic closeness centrality,eigenvector centrality,and eccentricity were different in the high novelty and medium novelty networks of the four journals,while the betweenness degree differed in the NEJM,Lancet,and JAMA networks.3.Novelty evaluation model based on machine learning algorithm(1)Comparison of various algorithms in journals of different disciplines:(1)The accuracy,recall,F1,and AUC values of the neural network model in the classification models of the CELL papers were the best;(2)In the classification models of JAMA papers,naive bayes,and random forest can classify papers,but the accuracy and recall rate of papers with high novelty were low.(3)In the classification models of the Lancet papers,only naive Bayes achieved the classification of papers,and the accuracy and recall rate of papers with high novelty were low;(4)Random forest,logistic regression,and decision tree methods in the classification of NEJM papers had high accuracy and recall rate.(2)Comparison of different algorithms: The support vector machine(SVM)algorithm performed best in the overall data,occupying the first place in the data performance of F1000,interesting hypotheses,new drug targets,and technical advance.In particular,the classification quality of interesting hypotheses was better,and the AUC value in JAMA was as high as 0.9115.In terms of journal AUC value,the classification quality of NEJM was the best among the four journals,while the AUC value of CELL was very low.The classification quality of the three medical journals and biological journals was different.Conclusion: 1.It is feasible to use machine learning algorithms to classify and predict the novelty of papers based on the attribute features of a single paper.(1)In this study,the machine learning method applied in the experiment showed a certain degree of differentiation between F1000 and non-F1000 text classification.SVM showed the best performance(the highest AUC value).In particular,the SVM machine learning method is of great significance for discovering interesting hypothesis papers in comprehensive medical journals.(2)The optimal algorithm varies among journals in different fields.Naive bayes and random forest algorithms are more suitable for clinical journals,and the neural network algorithm is more ideal for integrative biology journals.2.The novelty evaluation of the paper is related to the subject field.(1)There are differences in the overall attributes of the co-occurrence networks of Me SH terms in different disciplines.(2)The subject type of the journal has an influence on the text classification of F1000 and non-F1000 recommended articles.(3)The co-occurrence network of domain Me SH terms represented by journals is scale-free as a whole.The degree of the Me SH terms co-occurrence network of journals conforms to a power-law distribution.The frequency distributions of closeness degree and betweenness degree have the same trend in the four journals. |