Font Size: a A A

Application Research On Automatic Classification Of Massive Academic Resources

Posted on:2020-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:K WangFull Text:PDF
GTID:2428330626451321Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rise of the Internet and the rapid development of information technology,hundreds of millions of academic resources are born every year.While this data is delivering massive amounts of information to people,it also confuses users in finding and using these resources—information overload.Along with this,the way people get information has changed,from a simple "people looking for information" to a "search + recommended" dual-engine mode.The value of tagged data is becoming more prominent,and the quality requirements for data tagging are increasing.Text is the main presentation of academic resources.As the most widely distributed and most informative information carrier,how to organize and manage these data scientifically and effectively is a difficult problem to be solved.Disciplinary classification,as an important label for distinguishing academic resources of different contents,has greatly helped the organization,archiving,retrieval and recommendation of massive academic resources.Therefore,the research of automated text categorization technology is particularly important,which has aroused widespread concern in academia and industry.Academic resource classification is based on the basic content of academic data,to explore some of the characteristics and information closely related to the category,the process of mapping resources to specific categories.Traditional machine learning methods are highly dependent on the characteristics of artificial selection,generalization and poor ability to migrate in the field.It is a hot research topic to use the deep learning method to integrate feature engineering into the construction process of the model,thus reducing the incompleteness and redundancy of artificial design features.This paper is oriented to massive academic data.According to the basic characteristics of cross-type academic resources,the corresponding classification models are designed,including text classification model based on two-way GRU network and attention mechanism,and hybrid classification model based on keyword features and convolutional neural networks.,aimed at improving the comprehensive performance of academic resource classification.The work of this paper and the main innovations are as follows:Long text resources such as academic news,journal articles,etc.,the sentence structure is complex,and contains a large number of topics.The content has overlapping and overlapping phenomena,and the classification degree is not obvious.Aiming at the shortcomings of traditional methods in semantic analysis and contextual relationship mining,this paper proposes an academic resource classification model based on two-way GRU network and attention mechanism.The model effectively combines the advantages of cyclic neural network and convolution operation in feature extraction,and learns more global text expression through bidirectional features and information interaction between time steps.At the same time,the attention mechanism is introduced on the convolution layer to extract the abstract semantics of higher-level dimensions,and highlight the distinguishing effect of key information on text classification.In addition,for the short text classification task of the patent title,we put forward the attention pooling method based on the above research to reduce the information loss of the pooling layer.In the process of academic resource classification,it is often faced with problems such as high-dimensional data,sparse features,and semantic loss.Traditional methods are difficult to solve.Based on this,we design a hybrid classification model based on keyword features and convolutional neural networks.Based on the research of academic news and books,the paper introduces category information to improve feature selection and feature weighting techniques to improve the effect of keyword extraction.At the same time,we optimize the local structure of the convolutional neural network and improve the ability of the model to express features.Through the combination of manual extraction features and automatic extraction of machine features,the hybrid model has achieved good results in both classification and keyword extraction tasks.
Keywords/Search Tags:text classification, deep learning, recurrent neural network, convolutional neural network, attention mechanism
PDF Full Text Request
Related items