Font Size: a A A

Research On Multi-label Classification For Scientific Text Resources Based On Deep Learning

Posted on:2021-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2428330623967900Subject:Mechanical engineering
Abstract/Summary:PDF Full Text Request
Science and technology service industry is an important part of modern service industry,and science and technology resources as the cornerstone of the development of science and technology service industry.At present,scientific and technological resources have grown and developed unprecedentedly in quantity,category and distribution.However,these rich scientific and technological resources are scattered,isolated,diverse and complex,resulting in low integration and effective utilization rate of resources,so it is difficult to give full play to their support for science and technology and the real economy.Therefore,"integration" and "scientific analysis and utilization" of science and technology resources are the key core tasks and inevitable trend of science and technology services in China.Among them,"classification" is the premise and foundation of "integration" and "scientific analysis and utilization" of scientific and technological resources.Most of the scientific and technological resources exist in the form of text,and these scientific and technological text resources all belong to multiple categories,so the research of multi-label classification method for scientific and technological resources has become an important content and hot trend in the research of scientific and technological resources classification method.Therefore,this paper focuses on the "resource collection,industry integration,and innovation mode" and the goal of building the resource system and resource sharing mode of science and technology service industry proposed in the national key R&D program "Distributed Resource Giant System and Resource Synergy Theory"(Project Number: 2017YFB1400301)".The project is aimed at the cross platform resource aggregation and integration of scattered,isolated,complex and diverse scientific and technological resources,so as to support the task of cross industry distributed scientific and technological resource search,analysis,matching,evaluation and optimization.With the unstructured scientific and technological text resources in Wanfang scientific and technological service platform and public service platform of Ningbo Institute of science and technology information as data support,the multi label text classification problem supporting the aggregation and fusion of scientific and technological text resources is mainly studied.The main contents of this paper are as follows:(1)In view of the problems of poor effect,irrationality and low efficiency of the existing text classification methods for scientific and technological resources,based on the analysis of the characteristics of scientific and technological text resources and the problems of classification methods,this paper studies and puts forward the overall technical implementation scheme of multi label scientific and technological text classification based on deep learning,which is composed of the pre-processing oriented scientific and technological text and the multi label scientific and technological text classification based on seq2seq Class method is composed of two parts,which are respectively for the processing of scientific and technological text data source and classification requirements.(2)According to the characteristics of long text,loud noise and large professional vocabulary in the data source of scientific and technological texts,as well as the problems of continuous disjunction and stop words in Chinese texts,the preprocessing of scientific and technological texts is completed,which mainly includes text-based,text noise removal,word segmentation,stop words removal,training of word vectors for scientific and technological texts,etc.,so as to classify the subsequent scientific and technological texts To provide data quality assurance and formal support for text data.(3)Aiming at the problems of poor effect,irrationality and low efficiency of the existing text classification methods of scientific and technological resources,and the existing multi label text classification methods do not consider the local and global semantic information of the text at the same time,and do not fully consider the relevance between the tags,a multi label scientific and technological text classification method based on seq2seq is proposed.This method consists of two parts: encoder and decoder.Firstly,the encoder extracts the phrase representation in the text through convolution neural network,then obtains the text vector through LSTM and attention mechanism,and decoder decodes the text vector extracted by the encoder through LSTM and the initialized full connection layer,so as to obtain the predicted tag set.(4)In order to verify the performance of the multi label text classification method,this paper conducts experiments on three open datasets.First of all,a comparative experiment is carried out with the multi label classification model in recent years.The experimental results show that our method is better than the previous work.In this paper,the further analysis of the experimental results shows that the phrase representation extracted by convolutional neural network is effective for classification,and the initialized full connection layer can effectively capture the correlation between the two labels;in addition,the study of the length of the label sequence also shows that our method can better predict the number of more labels than the current best method.(5)In order to verify the effectiveness of the multi label technology text classification scheme proposed in this paper,we applied this scheme to the classification of scientific papers.By comparing with the text classification schemes in the existing technology resource service platform,the results show that the multi tag technology text classification scheme based on deep learning proposed in this paper is obviously better than the text classification scheme in the platform.
Keywords/Search Tags:text classification, multi-label text classification, seq2seq, deep recurrent network, initial full connection
PDF Full Text Request
Related items