| In recent years,the growth rate of scientific papers has continued to rise.The huge number of literature resources has brought both opportunities and challenges to researchers: they have to spend a lot of time and energy reading literature in related fields to obtain important information such as research frontiers,research hotspots,existing problems,main technologies,etc.In view of this situation,how to quickly and efficiently conduct information mining and induction is particularly important.Automatic review can automatically summarize information from existing literature,which is a significant means to deal with massive academic resources and is of great importance in scientific research and even management decision-making.Most of the existing research related to automatic review is organized according to sentences,topics,or steps.Among them,topic-level automatic reviews can cover more comprehensive sub-topic information,while step-level automatic reviews are more reasonable in terms of document structure.Relevant technical methods for realizing automatic review include measurement + shallow semantic analysis and text summarization.The former can obtain relatively macro-inductive information,while the latter can go deep into the text content and achieve content-level summarization.Although there is a certain foundation for automatic review related research,the following shortcomings still exist:(1)in terms of content organization,there is no automatic review organization forms meets the application requirements of the problem-driven scientific research paradigm;(2)in terms of technical methods,measurement + shallow semantic analysis usually has a shallow analysis level,and the results of induction need to be further interpreted by experts;the text generated by automatic summarization methods has problems such as lack of semantic logical relationship and content fragmentation.In response to the above problems,this paper adopts the method of combining measurement + shallow semantic analysis and text summarization to build a problemdriven automatic review technical framework for the application requirements of problem-driven scientific research,and methods to solve key technical problems in the technical framework are proposed.Finally,a problem-driven review text with strong readability is generated with the organizational "problem-solving" logic.Specifically,the following work has been mainly carried out:(1)Problem-driven automatic review technical framework design.By analyzing the structure and content characteristics of the review document,and drawing on the problem-driven research paradigm,the key technical issues of the problem-driven automatic review are proposed,and the technical framework of the problem-driven automatic review is established.(2)Topic-problem instance identification and induction method.A topic-level problem instance recognition method is proposed,which regards the question instance identification task as a topic-based candidate phrase classification task.The syntactic dependency tree is established,and the candidate phrases are extracted by using the syntactic analysis tools.A topic-problem instance recognition model with enhanced syntactic dependency is built.The model learns lexical information through the transformer and syntactic information through BIGCN,and realizes the mutual gain of the two types of information through their interaction.The topic-based attention module is used to determine whether a candidate phrase is the question instance corresponding to a given topic.The model recognition accuracy rate is 84.6%.Compared with the baseline model,the improvement is 2.3%.A fine-grained problem instance induction method based on the Leiden community detection algorithm is proposed.(3)Problem-method instance identification and induction method.Existing method instance recognition models rarely consider the subject of the method.In this paper,a feature-enhanced sequence annotation model is proposed for method instance recognition.On the basis of the original three types of input features of BERT,the model adds part-of-speech features and prompt features to help the model determine the subject of the method.The model recognition accuracy rate is 90.2%.6.1%improvement over the baseline model.The merging of the same method instances is realized by lemmatization,abbreviation-prototype merging,tailing operations,and edit distance.(4)Topic-descriptive sentence identification and induction method.Applying RGAT to the recognition of topic-level descriptive sentences,the model can learn the comprehensive syntactic information of sentences and pay attention to the context that has syntactic dependencies on the topic.The recognition accuracy rate in the topic-level descriptive sentence recognition task is 95.48%.A multi-level descriptive sentence induction method is proposed.The method uses Sentence-BERT to calculate the sentence-level and keyword-level similarity,so as to capture both the global similarity of sentences and the similarity of important information.The MMR diversity reordering algorithm is applied to the sentence scoring results of Text Rank to take into account both the importance and diversity of induction results.(5)On the basis of realizing the key technologies of automatic review,a problemdriven automatic review generation template is designed,and the summarized information is organized according to the template to form the final review text.In addition,taking "community detection" as the topic in the empirical research,the effectiveness of the proposed method in the tasks of topic-related problem induction,problem-related method mining,topic-related description information induction,and summary text generation is verified by comparing with the existing literature review and expert evaluation.The overall score of the generated review is 4.2 in the expert evaluation,indicating that the review method proposed in this study can better meet the needs of users.The main research innovations of this paper are as follows:(1)The technical framework of problem-driven automatic review is proposed.Compared with other forms of automatic review,this research organizes the review content in the logic of “Problems-Solving”,which is more in line with the problemdriven research in the environment of scientific and technological innovation in our country.To date,no research has been found to apply a problem-driven research paradigm to automated review organizations.(2)A syntactic dependency enhanced topic-level problem instance recognition model is proposed.Compared with other forms of automatic review,this study organizes the review content in the logic of problem-solving,which is more in line with the needs of problem-driven research in our country’s scientific and technological innovation environment.There are few studies that embody the idea of problem-driven in automatic review at present. |