Font Size: a A A

Identification Method Of The Core “Problem-Method-Conclusion” Coreference Triple In A Single Scientific Paper

Posted on:2021-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:S Y DengFull Text:PDF
GTID:2518306521963259Subject:Information Science
Abstract/Summary:PDF Full Text Request
Nowadays,the huge amount of scientific and technological literature resources brings some difficulties to scientific research.Currently,most researchers manually search,read and summarize the core contents of literature,which is time-consuming and laborious.In addition,most of the existing literature doesn't have the structured summary,so it takes a lot of reading to understand and summarize.Therefore,there is a need to quickly and efficiently understand the paper and organize the knowledge,and it is of great application value to automatically analyze,mine and organize the core knowledge of the massive scientific and technological literature resources.The core research content and research logic of a scientific paper can usually be expressed by three parts: "problem","method" and "conclusion" : In order to solve the existing problems,the author proposes or uses some methods,through which research is carried out and conclusions are drawn.So we say that the “problem,method,and conclusion” triples that have this co-referential relationship form a knowledge unit.Sorting out the internal semantic relations of knowledge unit can effectively help researchers quickly understand the research content and logic of a paper.The purpose of this study is to automatically identify the “problem,method and conclusion” coreference triples from a single scientific paper.Firstly,all the problem,method and conclusion sentences in specific chapters of the article are identified by text classification,which are the candidate sentence sources of triples.Then,the core topic sentence of research is found based on the method of text similarity and template.Finally,the co-referential relationship of knowledge unit is established among the sentence pairs.Starting from the core topic sentence,other "problems","methods" and "conclusions" having coreference relationship in a knowledge unit are searched through the relationship tracing.Specifically,the following work was carried out:(1)Firstly,this paper analyzes the features of core “problem-method-conclusion”coreference triple,including the feature of location and the patterns of co-referential relationship in the knowledge unit.It is found that there is a strong correlation between the complex semantic relation and the classical sentence pairs in four NLP domains: coreference relation,discourse relation,semantic matching and semantic contradiction matching.The exploration of patterns lays a foundation for the subsequent model construction.(2)Identify problems,methods and conclusion sentences as candidate sentences for triplet recognition.Abstract,Introduction and Summary are selected as the data source chapters.Compared with the existing text classification methods,considering the model efficiency and limited data,the pre-trained and fine-tuning architecture is adopted,and ULMFi T is selected as the pre-trained model to train the classification model.Finally,the articles of different fields are randomly selected,and all the problems,methods and conclusion sentences are identified from the specified chapters.(3)Core sentence extraction,that is,extract the sentences that contain the core content of this research.With the help of syntactic expression rules and normative dictionaries,the author finds out the sentences that express the content of this paper,and then judges whether they are relevant to the key content of this paper by their relevance with the abstract,title and other key sentences of this paper.(4)Identify the core “problem-method-conclusion” coreference triple.Combining with the text similarity and the recognition of the co-reference relation in knowledge units,the "method","problem" and "conclusion" sentences was identified in the case of that the core sentences are the starting point of the relationship tracing.The multi-task learning framework is adopted for the recognition of the co-reference relation in knowledge units,and the text feature representation under strong correlation is memorably learned through the hard sharing of hidden parameters,so as to help improve the recognition of complex relations in the main task.The Title,Abstract,Introduction and Summary of scientific and technological literature were selected as the required data for the experiment.The experiment results of each part were summarized,and the improvement ways were proposed.The identification method of core “problem-method-conclusion” coreference triple has achieved a good accuracy rate,which also verifies the effectiveness of the method.The paper includes 7 figures and 8 tables.
Keywords/Search Tags:Complex Relationship Recognition, Coreference Relation Recognition of Knowledge Unit, “Problem-Method-Conclusion” Coreference Triple, Multi-task Learning, Deep Learning
PDF Full Text Request
Related items