Research On Text Classification Based On Attention Bi-LSTM

Posted on:2019-06-28

Degree:Master

Type:Thesis

Country:China

Candidate:Q Wang

Full Text:PDF

GTID:2428330566486658

Subject:Software engineering

Abstract/Summary:

With the continuous development of information technology,a large amount of text data has been generated on the Internet.Text automatic classification technology has also been rapidly developed as a key technology for organizing and processing large amounts of text data.Although text data contains a wealth of information,due to its unstructured characteristic,computers cannot calculate with it directly and thus cannot obtain valuable information.The core of text classification technology is the vectorized representation of text.The traditional text representation method is based on keyword setting and word frequency statistics.The disadvantage of this method is that the association relationship between words and the semantic information hidden in the text context are ignored,and the extracted feature vectors have the shortcomings of high dimensions and high sparseness.With the continuous development of deep learning technology,deep neural network model has been proved to its advantages in feature extraction of unstructured data.On the basis of summarizing the traditional methods of text feature extracting and classifying,this paper deeply studies the problem of text classification using deep neural network models.The main research works of this paper are:This paper studies and analyzes several key steps in text classification tasks.First,the word embedding technique was studied.Word embedding can map words into a low-dimensional real vector through neural networks,which effectively avoids the shortcomings of traditional word vectors lacking semantic information.Afterwards,text feature extraction and classification methods are studied.It is believed that feature vector dimensions are often large,and the importance of word order information is ignored in traditional methods of text feature extraction and classification.It is believed in this paper that applying LSTM on the tasks of text feature extraction and classification can solve the problems in traditional methods due to its advantage of processing serialized data.On the basis of feature extraction and classification model based on LSTM,this paper proposes a strategy combining Attention mechanism and Bi-LSTM model to solve the text classification problem and further improve the performance of the classification model.The Bi-LSTM model with Attention mechanism can get the probability distribution of attention by calculating the correlation between the intermediate state and the final state.The purpose is to assign different weights to the state at each moment,and to retain valid information.So,the problem of information redundancy is solved to the greatest degree,and the accuracy of text classification is further improved by optimizing text feature vectors.

Keywords/Search Tags:

Text Classification, Word Embedding, Text Feature Extraction, LSTM, Attention Mechanism

Related items

1	Research On Text Classification Method Based On Bidirectional LSTM
2	Research On The Method Of Text Feature Extraction
3	Improvement And Application Of Text Classification Based On RNN
4	Text Classification Research Based On Deep Neural Network And Attention Mechanism
5	Research On Text Classification Algorithm Based On Word Embedding Model
6	A Research On Feature Extraction Applied For Text Classification
7	Cross-Lingual Text Classification Based On Monolingual Word Embedding Mapping Without Parallel Corpus
8	Short Text Classification Algorithm Based On Temporal Convolution And Attention Mechanism
9	Using Word Embedding And Text Feature For Event Extraction
10	Research And Application Of News Text Classification Based On Deep Learning