Font Size: a A A

Research On Identification Method Of Citation Objects In Scientific Papers

Posted on:2021-08-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:N MaFull Text:PDF
GTID:1488306521963169Subject:Information Science
Abstract/Summary:PDF Full Text Request
Traditional citation analysis mainly considers the number of citations,but there are abundant semantic connections among citations,such as citation motivation and citation sentiment.In order to more effectively reveal the content connection among papers,this thesis proposes the concept of citation object to describe the connection points between papers and the citation content,and explores to apply current development achievements of information technology to realize the automatic identification of citation objects.This can help to reveal the important knowledge unit of citations and evaluate the academic contribution of citations from a semantic perspective.This thesis summarizes the relevant research progress at home and abroad,and proposes two kinds of citation objects—term citation object and fact citation object.In order to effectively realize the automatic identification,an overall research idea of fusing important features of citation objects to optimize the deep learning model is proposed.Three key questions of this study are clarified:(1)What are the important characteristics of the two citation objects?(2)How to integrate important features of term citation objects to optimize the deep learning model,and improve identification performance of term citation citation objects?(3)How to integrate the important features of fact citation objects to optimize the deep learning model,and improve the identification performance of fact citation objects?The thesis focuses on the above three key issues and conducts three aspects of research:(1)A citation object label framework is designed and used to build a manually labeled dataset of citation object,which includes 50 articles and 1438 reference objects,and the characteristics of the citation object is analyzed from multiple dimensions such as distribution,content,syntax,location and so on.(2)A deep learning sequence annotation model fusing linguistics and heuristic features is proposed,which effectively enhances the context representation of citation objects and realizes automatic identification of term citation objects;(3)The fact citation object identification is transformed into a content classification problem.Considering the logical structure distribution characteristics of citation sentences in scientific papers,a fact citation object multitask classification model combined with citation location information is constructed to realize automatic identification of fact citation objects.The thesis constructs training data sets using academic papers in the computer field and conducts validation experiments on the two proposed models.Experimental results show that(1)the F1 value of identification model of term citation object reaches 0.6018,which is 8.24% higher than the BERT model;(2)the model Macro-F1 value of identification model of fact citation object reaches 0.6498,which is 2.53% and 1.15%higher than the Bi LSTM-Attention benchmark model and BERT model,respectively.This thesis paper proposes and applies the important features of citation objects to optimize the strategy of deep learning models,and separately build identification models to capture the characteristics of citation.The experimental results of this thesis are better than those of the existing models.
Keywords/Search Tags:Scientific Paper, Citation Object Identification, Feature Analysis, Feature Fusion, Multi-task Learning
PDF Full Text Request
Related items