Font Size: a A A

Research On Text Classification Based On Deep Learning

Posted on:2021-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:X F LiFull Text:PDF
GTID:2518306575965489Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays is an era of information technology.With the rapid development of computer technology and information storage,the application field has gradually penetrated into every aspect of life.Text data is growing at an exponential rate.The effective collection,sorting,mining and analysis of pharmaceutical patent data are increasingly important for the development of the pharmaceutical industry.Text classification is mainly divided into three modules of preprocessing,feature extraction and classification recognition,among which text representation is the key point and the foundation.At present,the application of traditional text classification technology is mostly based on statistical learning and other methods,ignoring the association between words and the information hidden in the text context,which is not applicable to complex and structured text data.The unique network structure of deep learning is efficient in solving the current text classification problem.Based on the analysis and summary of text vector technology and deep neural network model,this paper makes an in-depth study on the application of deep learning model to solve text classification problem.The main research work of this paper is as follows:1?In this paper,a label classification model based on convolutional neural network(CNN)is designed.In the process of convolution and pooling,features of local information can be extracted effectively.Through dual-channel convolutional neural network,one set of word vectors can be fine-tuned to obtain more information,while the other set remains unchanged.The performance of the original model can be further improved as well as the network structure can be improved.Convolution kernels with different sizes and numbers can be designed to extract features from different angles.The maximum pooling method was used to further extract the features,and softmax function was used to classify them.2?According to the advantages of LSTM in extracting global features,the accuracy of medical patent label classification can be enhanced by combining the attention mechanism.In order to represent the patent text in a deeper level by using the structure and hierarchical information of sentences,by using the relevant information between tags,a Bi-LSTM network model based on attention mechanism is designed.Among them,LSTM model solves the gradient disappearance problem of traditional RNN,and the hidden state sequence of forward and reverse LSTM output in this architectures is connected into double channels,which can avoid the loss caused by direct addition.At the same time,the semantic encoding containing the attention probability distribution of input sequence nodes is obtained by means of attention mechanism,which highlights the role of key information,reduces the information loss and redundancy in the process of feature vector extraction.
Keywords/Search Tags:deep learning, patent classification, long short-term memory, attention mechanism, convolutional neural networks
PDF Full Text Request
Related items