Font Size: a A A

Research On Academic Figure Captioning Based On Deep Learning

Posted on:2022-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:M LiFull Text:PDF
GTID:2518306740483164Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Academic figrue captioning is a cutting-edge multimodal data processing task in the field of artificial intelligence,whose goal is to generate descriptions for structured or semi-structured academic figures,which can be applied to many fields such as human-computer interaction,image understanding,virtual assistant,and support for disabled people.Compared with natural image captioning,academic figrue captioning has more challenges.The existing image datasets do not involve academic figrues,and require a large scale figure dataset to support the training of model.Besides,the features of academic figures and natural images are completely different,and new methods are needed to establish the connection between images and texts.In response to the above situation,this paper proposed methods and strategies,the main contents are as follows:1.Designed a crawler system based on micro-service architecture to crawl the figures and descriptions of open source papers.The tasks of the crawler were split and deployed separately,and the communication between tasks was realized by message queue.The crawl results are taken as source data,while the descriptions are cleaned by artificial annotation and the abnormal parts of figures are detected and removed by the horizontal projection algorithm.2.Proposed an academic figure captioning model training method based on weekly supervied learning.Firstly,a description filtering method.This method filters unuseful descriptions by calculating the information quantity with term frequency and inverse document frequency.Secondly,topic model is used to classify the figures and extract the abstract semantics of figures.Finally,a CNN model for figure captioning is proposed,which utilizes the image features and abstract semantic features to generate descriptions.3.Proposed an academic figure captioning model.The model receives figure and the corresponding context input,and generates the context features by sentence embedding representation.Combined with the image and context attention mechanism,the model generates the description by LSTM.In order to deal with the out-of-vocabulary problem,in each time step of generating the description word,the context attention is used to find the appropriate replacement word in the context text to replace.
Keywords/Search Tags:deep learning, image captioning, attention mechanism, sentence embedding, multimodal fusion
PDF Full Text Request
Related items