Font Size: a A A

Research On Key Technologies Of Chinese Text Classification Based On Deep Learning

Posted on:2021-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:X M WangFull Text:PDF
GTID:2428330623968525Subject:Engineering
Abstract/Summary:PDF Full Text Request
As one of the mainstream research directions in the field of artificial intelligence,deep learning has achieved pretty good results in various classification tasks,including Chinese text classification task.The key technologies in Chinese text classification task generally include data preprocessing,Chinese word segmentation,vector representation,text classification and so on.The Chinese word segmentation technology and text classification technology based on deep learning are the focus of this study.In order to improve the performance of Chinese text classification tasks,this paper has improved the two key technologies.Finally,a simple Chinese text hybrid classification system is designed and implemented by combining the two improved technologies.The work of this paper mainly includes the following aspects:1?Combining the advantages of BERT and GRU models,a new multi-criteria Chinese word segmentation model is proposed.In the traditional multi-criteria word segmentation model,only the Bi-LSTM model is used.The training time will increase because of the bigger data set,so the simpler Bi-GRU model is used to speed up the training process of the model.At the same time,in order to extract richer semantic features from the text,BERT,a pre-trained model,is currently added to the model as a semantic feature extraction layer.According to the above two improvements,a new multi-standard Chinese word segmentation model is proposed,and a control experiment is set up.The improvement of the training time and the word segmentation effect proves the effectiveness of the two improvements.2?A mixed domain attention module in the field of computer vision was added to the short text classification model,and a new short text classification model was proposed.The convolutional neural network in traditional short text classification treats all features equally when extracting features.In order to enhance the model's ability to extract key features,following the practice in computer vision,a mixed domain attention module is added to the short text classification model.By setting up a control experiment with the original model,it is proved that adding the mixed domain attention module is really helpful for extracting key features of the text.3?A multi-channel hierarchical Attention model was proposed by applying a multi-channel mechanism to the hierarchical Attention model.When using the hierarchical Attention model for Chinese text classification tasks,Chinese long text increase the probability of word segmentation errors and then cause the loss of information.Therefore,a feature extraction channel for character-level text representation is added to supplement the loss caused by word-level text representation.The paper set up a comparison experiment between the improved model and the original model,and the improvement of classification effect proves that the feature extracted through adding character-level text representation is more comprehensive.4?A simple long and short text classification system was designed and implemented by combining the above improvements in Chinese word segmentation technology and text classification technology in Chinese text classification tasks.
Keywords/Search Tags:Deep Learning, Chinese Word Segmentation, Short Text Classification, Long Text Classification, Hybrid Classification System
PDF Full Text Request
Related items