| The peer review mechanism plays an essential role in scientific communication,which is considered the closest to the actual state of the evaluated object,is a practical guarantee of scientific quality,scientific allocation of research resources,and selection of innovative projects.In the past long time,due to the limitation of the traditional peer review mechanism,peer review comments have not been openly available on a large scale,and due to the development of natural language processing and text mining technology,researchers can not effectively uncover the mystery of peer review from the perspective of text content.The adoption of the open peer review by more and more journals and the available access of peer review comments provide a database for analyzing and mining the potential laws from peerreview comments.Furthermore,thanks to the rapid development of machine learning,natural language processing technology based on deep learning have significantly improved its performance in text representation,classification,and translation.At present,it is time to conduct the study of fine-grained mining of peer review comments from the text content level by using natural language processing technology and text mining technology based on open data.This paper regards peer review comments as a particular form of annotation made by peer reviewers/referees on academic papers.The distributed position of peer review comments in academic articles is expressed by the paper structure(IMRaD),where the content mentioned in the peer review comments is located.The research is carried out from four aspects: identifying the structure of academic papers,analyzing the distribution of peer review comments in different academic papers structure,research topic,and writing,exploring the multi-dimensional distribution of peer review comments types,and analyzing the position and types distribution of peer review comments under different citations.The research data includes accepted papers,rejected papers,and corresponding peer review comments to find valuable regulations from unstructured peer review comments.It can provide new ideas for innovating academic quality evaluation methods,recognizing the contributions of experts,and establishing the reputation system of peer reviewers.The research content of this paper includes the following four aspects:(1)Given the shortcomings of current academic paper structure recognition,this paper proposes a structure recognition method for academic papers based on the fusion of basic,statistical,and semantic text features.First,test the effect of different features and feature fusions on the structure identification of academic papers in the data set with a standard structure,and compare it with the benchmark model BERT to obtain the best model.Then,using the title feature words and the best model to identify the structure of section in Atmospheric Chemistry and Physics(ACP),the model’s recognition results are evaluated by using the similarity of reference distribution and the similarity of verb clue word distribution.Finally,the domain adaptability of the hierarchical attention network model is analyzed.The results show that the model recognition effect with the three features fusion is the best: the basic text features represented by graph distribution and table distribution,the statistical text features represented by chi-square,and the semantic text features represented by sentencelevel hierarchy attention network.In the fields with differences,the domain adaptability of the constructed model is significantly reduced.(2)The position information in the peer review comments is extracted by rules,and then the review comments will be mapped to the academic paper.Employ the structure of peer review comments in the academic paper to represent the position of peer review comments.We first analyze the distribution of peer review comments in different positions of academic papers.Then,the chi-square test combined with TF-IDF is used to extract the feature words of peer review comments distributed in different positions of academic papers.Finally,we analyze the distribution of peer review comments on the research topic and writing.The results show that the referees pay more attention to the materials,methods and results during the review process,which has nothing to do with the acceptance of papers;The distribution of the peer review comments feature words in different positions is different,which can reflect the specific content that the experts pay attention to in different paper structures in a fine-grained way.The distribution of the rejected papers’ peer review comments feature words such as "background","motivation","method","data","result","analysis",and "inclusion" is significantly higher than that of the accepted papers.It shows that there may be many deficiencies in the rejected papers related to the feature word;The number of peer review comments distributed on the research topic and writing of the rejected papers is higher than that of the accepted papers,indicating that the research topic and writing of the rejected papers typically receive more attention from experts.(3)Classifying and identifying the types of peer review comments first,and then analyzing the distribution of the types of peer review comments in different years,different positions in review reports,different structures,research topic,and writing of academic papers.The results show that the distribution of negative evaluation is higher than positive evaluation over the years,which shows that reviewers usually critically evaluate a manuscript.The total number of requirements/suggestions containing the primary and secondary aspects has a high proportion in the distribution over the years,reflecting the contribution of reviewers who act as the gatekeeper of science to improving manuscript quality.The distribution of the peer review comments types in different positions of peer review reports has a significant distribution law.The number of the peer review comments corresponding to the comments types distributed in the Materials & Methods and Results is much higher than that in Introduction and Discussion.In the rejected papers,the proportion of positive evaluation in Materials & Methods,Results,research topic,and writing is smaller than that in the accepted papers,while the distribution of negative evaluation is the opposite.The negative evaluation proportion in Discussion,research topic,and writing is higher than in the accepted papers.The distribution of requirements/suggestions(primary)in Introduction,Materials & Methods,and Results is slightly higher than in the accepted papers.The distribution of requirements/suggestions(secondary)in writing is higher than that in accepted papers,while the distribution of both in other structures and aspects is smaller than that in accepted papers.The distribution of questions/questions in each structure and aspect is close to that of the accepted papers.The above results show that there are many deficiencies in the rejected papers.It highlights that the referees pay more attention to the materials,methods,results,discussion,research topic,and writing.The rejected papers usually contain several peer review comments from the macro perspective.(4)The research on the location and type distribution of peer review comments under different citations can provide new ideas for evaluating the academic quality of papers and quantifying the contributions of peer reviewers.Data pre-processing is also necessary,such as the research topic clustering,citations standardization.Then the paper analyzes the correlation between the distribution of peer review comments in different positions of academic papers and citations,the correlation between the distribution of different peer review comments types in peer review reports and citations,and the correlation between the distribution of peer review comments types in different positions of academic papers and citations.The results show that the distribution of review comments in different location that has nothing to do with the citations.The distribution proportion of positive evaluation to citations has a significant positive impact,and the distribution proportion of negative evaluation to citations has a significant negative impact.No correlation was found between the distribution of peer review opinions in different positions of academic articles and the citations.To sum up,from the text content level,this paper combines peer review comments,full academic papers,and citations,and with the help of natural language processing and text mining technology,it conducts fine-grained mining of peer review comments to find valuable rules or models.This study deepens the understanding of peer review and makes a beneficial exploration for further mining and application of peer review. |