Font Size: a A A

Chinese Text Classification Based On Attention Mechanism And LSTM-CNN

Posted on:2024-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:L L ZhuFull Text:PDF
GTID:2568307181454144Subject:Engineering
Abstract/Summary:PDF Full Text Request
The Internet already has a huge amount of textual data thanks to the rapid development of IT.Text is a kind of unstructured data,but it contains a large amount of information.Although the existing research methods can classify Chinese text data to some extent,it is often difficult to extract the features of the text perfectly,so there will be problems of low accuracy and unclear classification results.In order to solve the above problems,this thesis targets the extraction of sequence information of text(including short distance between statements and long distance between contexts),firstly improves the traditional CNN network and Bi LSTM network,and proposes a local and global feature information extraction method for Chinese text classification based on Bi LSTM and CNN to improve the text classification effect.Then,the attention mechanism is used to focus on key features to improve the problem of overfitting and ignoring keywords in sentences,and a Chinese text classification model based on the attention mechanism and LSTM-CNN is proposed.A text classification system is finally implemented.The main research works in this thesis are:(1)Research on local and global feature information extraction methods for Chinese text classification based on Bi LSTM and CNN(CBLSTM).In order to address the problems that the traditional CNN network has too much feature abstraction and it is difficult to fully extract the features,and the LSTM network cannot fully utilize the forward and backward propagation information,which is not conducive to expressing the true meaning of the text,the research of local and global feature information extraction method for Chinese text classification based on Bi LSTM and CNN is proposed.The method first improves the design of the convolutional kernel of CNN so that it can extract the deep local features of text.Then an innovative unbalanced LSTM is proposed to increase the weight of important semantic information while focusing on local features and global semantic features.Finally,by experimenting on a relevant Chinese dataset,the results show that the method is effective.(2)A study of Chinese text text classification model fusing self-attentive mechanism and CBLSTM(Att-CBLSTM).To address the problem that the classification model is unable to better aggregate the acquired information,resulting in lower classification results,a Chinese text classification method that fuses the self-attentive mechanism with CBLSTM is proposed.The method first incorporates a multi-channel mechanism in the attention mechanism to enhance the extraction of key information in the word module and sentence module,and then fuses the CBLSTM method with the attention mechanism,which can fuse the obtained intermediate state features with the final state features,and finally solves the problem of information redundancy and optimises the classification results.Experiments show that the fusion model of self-attentive mechanism and CBLSTM proposed in this thesis for Chinese text classification model is effective and has obvious advantages in all aspects compared with CNN model and Bi LSTM model.(3)Design and implementation of a Chinese text classification system based on the attention mechanism and CBLSTM.The system is developed in Python and uses a deep learning model to mine text features,and uses the attention mechanism and CBLSTM method to classify Chinese text.The deep learning model is implemented using open source toolkits such as Pytorch.The system can be used by reading text files or inputting text strings and outputting text classification results.
Keywords/Search Tags:Chinese text classification, neural network, attention mechanism, deep learning
PDF Full Text Request
Related items