Research And Method Of Text-Image Summerization Based On Multimodal Neural Network

Posted on:2022-06-22

Degree:Master

Type:Thesis

Country:China

Candidate:L He

Full Text:PDF

GTID:2518306341453704

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Generally,most of the existing research on automatic summarization methods focus solely on the text field or the image field only.With the rapid growth of multimedia data on the Internet,multimodal summarization has gradually drawn widespread attention.Existing experiments have proved that,compared with text summarization,multimodal summarization can improve the quality of generated summarization compared with text summarization by using image feature information in the visual modality.Besides,multimodal summarization output can significantly improve users'satisfaction with summary information.In recent years,researchers have begun to study multimodal news summarization to generate multimodal output,which can be called Multimodal Summarization with Multimodal Output(MSMO).Researchers from the Chinese Academy of Sciences have released corresponding MSMO dataset.The latest research results are based on the pointer generator network.By introducing image attention and multi-modal attention mechanisms,using data extension and introducing image loss,the current best results have been achieved on the MSMO dataset.Different from the previous method of data extension using a single rule,this thesis proposes a data extension method based on statistic model,which concerned text-image relevance and image importance,effectively expands the image annotation data in the training data of the MSMO dataset.The experiment indicates that image location information is an important feature in the image summarization task,which proves the effectiveness of the data extension method.This thesis proposes a novel framework of multimodal summarization multimodal output task,which is based on the text-based Sequence to Sequence(Seq2seq)framework.The thesis decouples traditional Sequence to Sequence framework,connects the encoder and the decoder with multi-modal interaction layer,which learning the relevance of image-text domain information.This framework has high flexibility,can inherit the structure and parameters of other text Seq2seq models like pretrained language model,and support different image encoders and decoding methods.This thesis uses the state-of-the-art of generative pretrained language and pretrained vision model in experiments.The thesis achieved best results in the text summary ROUGE metric and the image accuracy.

Keywords/Search Tags:

text-image summarization, multimodal embedding, dual-stream attention, deep learning, sequence-to-sequence model

PDF Full Text Request

Related items

1	Research On Text Summarization Technology Based On Deep Learning
2	Research Of Model For Abstractive Summarization Based On Deep Learning
3	Research On Abstract Text Summarization Based On Sequence To Sequence Model
4	Research On Key Techniques Of Two Phase Automatic Summarization Algorithm For Long Text
5	Research On Chinese Abstractive Text Summarization Based On Sequence To Sequence Model
6	Research And Application Of Related Techniques For Text Summarization Based On Deep Learning
7	Research On Automatic Text Summarization Generation Technology Based On Deep Learning
8	Abstractive Document Summarization Based On Deep Sequence To Sequence Model
9	Improved Seq2Seq Text Summarization Generation Methods
10	Research On Text Summarization Method Based On LSTM Sequence To Sequence Model