Font Size: a A A

Citation Importance Classification Towards Scholarly Full-text Articles And Its Application In Topic Identification Of Scientific Literature

Posted on:2022-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:X SunFull Text:PDF
GTID:2518306737475874Subject:Statistics
Abstract/Summary:PDF Full Text Request
Citations play a vital role in scientific literature,which reflect the later researchers' reference to the predecessors and are deemed as the vehicle for dissemination,inheritance and development of scientific knowledge.But not all citations are equally important.The quantitative indicators are commonly used in traditional citation analysis,which ignores the specific contributions of academic literature and is not conductive to the fair distribution of academic resources and fair evaluation of talents.Therefore,the important citations identification plays a vital role in scientific evaluation of the impact of cited documents.Besides,with the development of the era of big data,electronic academic literature has exploded,which renders an urgent problem to be solved that identifying research frontiers of subjects and topics from a large number of literatures with diverse themes and exploring the identification of literature topics oriented to the importance of citations,so as to extracting the topics of literatures more accurately.In order to study the classification of important citations,this paper divides citations into two categories that important and incidental.And two types of data sets with different characteristics are used for experiments,one is an expert-annotated data in one discipline from the field of computational linguistics,and the other is an author-labeled data which mixing multi-disciplines.Based on the full-text articles of academic literature,this research implements traditional feature engineering and excavates features from a generative model CIM model,which enriches the existing feature system.The generative model is combined with the discriminative model SVM and RF,and the importance of citations is automatically classified based on supervised learning.The experimental results show that the CIM-model based features can improve the performance of important citations identification.Compared with previous studies,the classification effect has been improved to a certain extent.The RF classifier outperforms the SVM classifier.And the patterns for important citations identification is varied by the fields.Due to the labeled data is difficult to obtain and only a small volume of data has been annotated,a large amount of unlabeled data is easier to obtain.In order to make full use of unlabeled data and promote the learning performance and adaptability,the semi-supervised self-training method is utilized.The experimental results show that the semi-supervised self-training method improved the performance of the supervised versions.The proposed strategy for important citations identification in this research has practical value for breaking the unfair evaluation of scientific research and academic achievements which use only quantitative indicators.Further,in order to study the identification of literature topics oriented to the importance of citations,this paper introduces citation information with different lengths into the cite-p LSA-LDA topic model to identify the literature topics in the field of computational linguistics.And it shows that the literature topic identification has the best effect when the citation sentence that has only one single sentence is used.Then the cosine similarity and symmetrized KL divergence are used for analyzing the similarity between topics of literatures.It is found that the important citations have higher similarity of topics with citing documents than incidental citations,that is,the distribution of citation links is more inclined to important citations.This paper constructs a topic model that takes citations into account and analyzes the similarity between topics of literatures based on the importance of citations,which provides new research ideas for better identifying the topics of literatures.
Keywords/Search Tags:Citation Importance, Generative Model, Supervised Learning, Semi-supervised Self-Training, Topic Model
PDF Full Text Request
Related items