Font Size: a A A

Research On Text Classification Based On Attention Bi-LSTM

Posted on:2019-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2428330566486658Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology,a large amount of text data has been generated on the Internet.Text automatic classification technology has also been rapidly developed as a key technology for organizing and processing large amounts of text data.Although text data contains a wealth of information,due to its unstructured characteristic,computers cannot calculate with it directly and thus cannot obtain valuable information.The core of text classification technology is the vectorized representation of text.The traditional text representation method is based on keyword setting and word frequency statistics.The disadvantage of this method is that the association relationship between words and the semantic information hidden in the text context are ignored,and the extracted feature vectors have the shortcomings of high dimensions and high sparseness.With the continuous development of deep learning technology,deep neural network model has been proved to its advantages in feature extraction of unstructured data.On the basis of summarizing the traditional methods of text feature extracting and classifying,this paper deeply studies the problem of text classification using deep neural network models.The main research works of this paper are:This paper studies and analyzes several key steps in text classification tasks.First,the word embedding technique was studied.Word embedding can map words into a low-dimensional real vector through neural networks,which effectively avoids the shortcomings of traditional word vectors lacking semantic information.Afterwards,text feature extraction and classification methods are studied.It is believed that feature vector dimensions are often large,and the importance of word order information is ignored in traditional methods of text feature extraction and classification.It is believed in this paper that applying LSTM on the tasks of text feature extraction and classification can solve the problems in traditional methods due to its advantage of processing serialized data.On the basis of feature extraction and classification model based on LSTM,this paper proposes a strategy combining Attention mechanism and Bi-LSTM model to solve the text classification problem and further improve the performance of the classification model.The Bi-LSTM model with Attention mechanism can get the probability distribution of attention by calculating the correlation between the intermediate state and the final state.The purpose is to assign different weights to the state at each moment,and to retain valid information.So,the problem of information redundancy is solved to the greatest degree,and the accuracy of text classification is further improved by optimizing text feature vectors.
Keywords/Search Tags:Text Classification, Word Embedding, Text Feature Extraction, LSTM, Attention Mechanism
PDF Full Text Request
Related items