Font Size: a A A

Research On Visual Question Answering Based On Multi-Channel CNN-LSTM

Posted on:2019-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:H W ZhangFull Text:PDF
GTID:2428330548474406Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Visual Question Answering is a multi-domain task of machine learning.It is considered as one of the most shining star of tomorrow in artificial intelligence.Visual question answering system is require to respond to a pair of arbitrary image and natural language question input,extract and then combine image feature and question feature,finally give the corresponding answer by reasoning.CNN is the most often used method to extract regional feature from image.However,CNN cannot supply distant dependency information.LSTM is often used to extract sequence feature of text,but LSTM cannot obtain regional dependency information.To combine the strength of these models,focusing on visual question answering task,we propose and implement a multi-channel convolutional neural network-long short memory(CNN-LSTM)model in this paper.The hybrid model can make use of convolutional neural network's ability of extract regional feature,as well as recurrent neural network's strength of processing a sequence.To prove the effectiveness of our model,we applied it to text classification task first.A neural network model,consisting of word embedding layer,multi-channel CNN-LSTM model,and fully connected layer,are used to solve Twitter sentiment analysis task of SemEval.After data retrieval,pre-processing,neural network training,and experiment with analysis,the proposed model turns out to be workable.For visual question answering task,image feature are represent by processed through a deep CNN network,and question feature are extract by multi-channel CNN-LSTM model.Features are then combined and used to predict answer.By comparing results of baseline model and model using multi-channel CNN-LSTM,we find that the new model can acquire both a higher accuracy and faster training process.This is also evidence for proving multi-channel CNN-LSTM model can extract needed feature effectively,and is ready for generalized usage even in different domain.
Keywords/Search Tags:Visual question answering, Sentiment analysis, Convolutional neural network, Long short-term memory
PDF Full Text Request
Related items