Research On Academic Figure Captioning Based On Deep Learning

Posted on:2022-04-21

Degree:Master

Type:Thesis

Country:China

Candidate:M Li

Full Text:PDF

GTID:2518306740483164

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Academic figrue captioning is a cutting-edge multimodal data processing task in the field of artificial intelligence,whose goal is to generate descriptions for structured or semi-structured academic figures,which can be applied to many fields such as human-computer interaction,image understanding,virtual assistant,and support for disabled people.Compared with natural image captioning,academic figrue captioning has more challenges.The existing image datasets do not involve academic figrues,and require a large scale figure dataset to support the training of model.Besides,the features of academic figures and natural images are completely different,and new methods are needed to establish the connection between images and texts.In response to the above situation,this paper proposed methods and strategies,the main contents are as follows:1.Designed a crawler system based on micro-service architecture to crawl the figures and descriptions of open source papers.The tasks of the crawler were split and deployed separately,and the communication between tasks was realized by message queue.The crawl results are taken as source data,while the descriptions are cleaned by artificial annotation and the abnormal parts of figures are detected and removed by the horizontal projection algorithm.2.Proposed an academic figure captioning model training method based on weekly supervied learning.Firstly,a description filtering method.This method filters unuseful descriptions by calculating the information quantity with term frequency and inverse document frequency.Secondly,topic model is used to classify the figures and extract the abstract semantics of figures.Finally,a CNN model for figure captioning is proposed,which utilizes the image features and abstract semantic features to generate descriptions.3.Proposed an academic figure captioning model.The model receives figure and the corresponding context input,and generates the context features by sentence embedding representation.Combined with the image and context attention mechanism,the model generates the description by LSTM.In order to deal with the out-of-vocabulary problem,in each time step of generating the description word,the context attention is used to find the appropriate replacement word in the context text to replace.

Keywords/Search Tags:

deep learning, image captioning, attention mechanism, sentence embedding, multimodal fusion

PDF Full Text Request

Related items

1	Research On Image Captioning Method Based On Deep Learning
2	Research On Visual Captioning Based On Deep Learning
3	Image Captioning Based On Deep Recurrent Convlution Network And Spatio-temporal Information Fusion
4	Deep Multimodal Attention Learning For Image Captioning
5	Research On Video Captioning Based On Deep Learning
6	Research On Image Caption Algorithm Based On Fusion Of Multi-attention Mechanism
7	Research On Social Image Captioning Based On Deep Learning
8	Research On Image Captioning Algorithms Based On Deep Learning
9	Research And Implementation Of Image Captioning Technology Based On Deep Learning
10	Research On Image Captioning Algorithm Based On Deep Learning