Font Size: a A A

A Study On The Classification Of The First Level Subjects In SCI Papers

Posted on:2021-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:T DuFull Text:PDF
GTID:2428330620463283Subject:Books intelligence
Abstract/Summary:PDF Full Text Request
With the strategy of "double first-class" universities and disciplines put forward by the Ministry of education,more and more educators have participated in this great historical journey.From schools to scholars,they all know the importance of discipline strength in the country's reform of university education.The corresponding strength of a school discipline is the output of its scientific research achievements,and SCI papers occupy a large proportion in the scientific research achievements.However,the current problem is that the disciplines of foreign SCI papers do not correspond to the domestic first-class disciplines,and the subject evaluation is mainly based on the first-class disciplines,so there are many problems in the actual operation.Throughout the past discipline evaluation work,university managers understand that the evaluation of scientific research results is largely based on SCI papers,and in recent years,domestic scholars have a large number of SCI papers,so it is necessary and also the basis of research to first clarify the primary discipline attribute of these SCI papers.This paper studies the comparison between the subject catalogue issued by the Ministry of education of the people's Republic of China and the interpretation of the subject content in wos,and with reference to the interpretation of the subject content on the official website of incites,continuously optimizes and adjusts,and finally constructs the mapping system of the subject both at home and abroad.In addition,through the idea of data mining,this paper makes an in-depth study on the content level of SCI papers,using the paper data in SCI database of web of science platform as the experimental sample,using a variety of data processing software to clean and process the original data,and finally constructs 275613 huge training set data.In addition,based on the completion of the subject mapping,the data subject attributes of the training set are labeled.Using the advantages of support vector machine suitable for high-dimensional text classification,TF-IDF algorithm is used to calculate the weight of each feature word,and then by evaluating the classification model with different training proportion,the ten level cross validation method is used to train the model to get the optimal subject classification model.Through the analysis of the results of empirical research,it shows that there are considerable differences between domestic and foreign disciplines classification,and the same research content is likely to appear in different disciplines classification results.Moreover,different disciplines will affect the accuracy of the classification model because of their disciplinary nature and research category.Based on the actual starting point,this paper carries out research experiments.The ideas conceived in the study of subject classification and the models constructed in the application of text classification have more meaningful reference value for subject evaluation and subject attribution of scholars.At the end of the paper,the author summarizes the problems encountered in the experiment and the ideas generated in the learning process,which will have a certain reference value for future researchers' research.Therefore,based on the needs of subject evaluation and the actual situation of the increasing intersection of various disciplines,it is a more meaningful research topic to construct a mapping relationship between Sci disciplines and domestic first-class disciplines,and divide the subject attribution of scholars through this mapping relationship.
Keywords/Search Tags:Subject evaluation, subject attribution, TF-IDF algorithm, support vector machine, text classification
PDF Full Text Request
Related items