Research And Implementation Of Chinese Long Text Classification Algorithm Based On Deep Learning

Posted on:2021-05-28

Degree:Master

Type:Thesis

Country:China

Candidate:C Jiang

Full Text:PDF

GTID:2428330602977690

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Text classification is one of the most basic and common tasks in the field of natural language processing,and it is also a pre-module for many other tasks.By doing text classification,the valuable parts can be roughly screened out,and other parts can be discarded,we can obtain more usable data.Research on text classification tasks has a long history.With the development of the times,people have higher and higher requirements for the accuracy and speed of text classification,and deep learning methods have become a research hotspot.Aiming at the problem of long text classification,this thesis improves a text classification model which combins recurrent neural network and convolutional neural network.First,the lattice long short-term memory system(Lattice-LSTM)was migrated to replace the traditional long short-term memory system for shallow text encoding.Because the performance of long short-term memory systems is weak for modeling long time series,this thesis adds a vocabulary-level self-attention mechanism to the model which changes the contribution of the output at different moments in text classification.Then a multi-size and multi-expansion-rate convolutional neural network model(Multi-size and Multi-expansion-rate Kernel Convolutional Neural Network,MMK-CNN)is used for feature extraction,and the resulting feature map is passed through a fully connected network and a SoftMax network to carry out the final text classification results.The main work completed by the author are:(1)Investigate and introduce the knowledge of text classification and deep learning(2)Migrating Lattice-LSTM model for shallow coding and incorporating self-attention mechanism(3)Using a combination of dilated convolution and ordinary convolution for feature extraction(4)Design and conduct experiments to compare with other models to verify the effectiveness of the modelThe model migration in this thesis uses a lattice long short-term memory system.Without explicit word segmentation,the word-level long short-term memory system is selectively injected with word segmentation information.The character-level vectors and word-level vectors are deeply combined through the neural network.This methods not only saves the workload of word segmentation and does not cause word segmentation errors,but also enriches the information contained in the text vector;by incorporating the attention mechanism in the model,it has been solved to a certain extent and reduced over-fitting;using convolution kernels of different sizes and different expansion to extract features at different angles can obtain high-dimensional features of the text,the information of each channel can complement each other.Finally,the experiment is carried out with news classification as the carrier.After experimental comparison,the improved algorithm is better than other algorithms and can improve the effect of long text classification to a certain extent.

Keywords/Search Tags:

Lattice-LSTM, Text Classification, Self-Attention, Dilated Convolution, Convolution Neural Networks

PDF Full Text Request

Related items

1	Research On Scene Text Detection Algorithm Combining Dual Attention Mechanism And Dilated Convolution
2	Research On Video Target Tracking Algorithm Based On MDNet
3	The Research On Key Technologies Of Object Detection Based On Deep Convolutional Neural Networks
4	Deep Convolution Neural Network And Its Application In Ground Image Target Recognition
5	Research On Text Classification Method Combining Attention Mechanism And Bi-GRU
6	Convolution Kernel Adaptive Text Classification Algorithm Based On Multi-channel Feature Representation
7	Research On Multi-oriented Scene Text Localization And Detection Based On Multi-scale And Big Receptive Field Deep Learning Features
8	Application Of Improved Deep Learning Algorithm In Chinese Text Classification
9	Entiment Analysis Of Comment Text Based On Deep Learning
10	Research For Remote Sensing Classification Of Convolution Neural Network Based On Region Information