Font Size: a A A

Research On Citation Context Recognition Based On Pre-trained Language Model

Posted on:2022-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:C R GuoFull Text:PDF
GTID:2518306497990739Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The citation context is also called citation content and citation area.It refers to a set of sentence fragments surrounding the citation mark of the citing document that has specific citation intent and motivation to evaluate,identify and summarize the cited document.Among them,sentences that contain citation marks are explicit citation sentences,and sentences that do not contain citation marks are implicit citation sentences.As the research foundation of many downstream topics such as citation motivation recognition,cited segment recognition,academic literature search optimization,topic recognition,etc.,the complete and accurate recognition of citation context is of great significance to the improvement of these tasks.The lack of citation marks makes it difficult to identify implicit citation sentences.Therefore,traditional research on citation context mainly uses explicit citation sentences and ignores implicit citation sentences,which largely limits the improvement of citation context-related tasks.Although there have been some text classification methods using artificially constructed features in the recognition of implicit citation sentences,the cost of manual feature engineering is very high,and the ability of feature expression is weak.The obtained models are often limited in effect and lack of scalability.At the same time,academic literature is often diverse and heterogeneous,and the lack of effective recognition tools also limits the application of implicit citation sentences in related research.In order to better improve the work of citation context identification,this paper,based on previous researches,makes a step-by-step combing and extension of the description of the type,scope and application of citation context,so as to provide a relatively complete explanation of the concept system of citation context.On the basis of a clear definition of citation context,the characteristics of the citation context recognition task are analyzed,and the automatic recognition of citation context is explored and studied from the perspective of sentence classification based on the pretrained language model.The SVM model and related data sets used by previous people in this study are used as the research basis,and the effectiveness of this research model is verified through comparative experiments.The results show that the recognition effect of the pre-training language model is due to the SVM model in all indicators.Compared with the SVM model that only uses sentence-to-text features,the F1 of the pre-trained language model is increased by 11%,and the weiht-F1 is increased by 7%.Compared with the SVM model using all artificial features,the F1 of the pre-trained language model increased by 3%,and the Weiht-F1 increased by 2%.Among all the pre-trained language models participating in the experiment,the Sci BERT model has the best effect,with F1 being 90% and Weight-F1 being 92%.Finally,this research developed a citation context recognition tool based on the model obtained in the previous research.In order to provide other researchers with richer citation context research corpus,improve the effect of related downstream tasks,and provide efficient citation context recognition services.
Keywords/Search Tags:Citation Recognition, Implicit Citation Sentence, Pre-Trained Language Model, Sentence Pair Classification
PDF Full Text Request
Related items