Research On Video Caption Generation Depth Model Based On Video Temporal Attention Level Fusion Mechanism

Posted on:2022-12-04

Degree:Master

Type:Thesis

Country:China

Candidate:L Y Zhou

Full Text:PDF

GTID:2518306743973999

Subject:Cyberspace security

Abstract/Summary:

PDF Full Text Request

Video caption generation is a video content understanding task,i.e.,when given a video clip,a computer can automatically generate a natural language caption containing dynamic information about the video scene or a caption summarizing the video information.With the development of modern information technology and the advent of the5 G era,video data is growing like an explosion,and the Internet world may now be filled with a large amount of illegal and non-compliant video content.At the same time,there are more than ten million hearing-impaired people in China who use sign language to communicate,and it is extremely difficult to make normal people understand the language of sign language movements of deaf people.Therefore,it is especially important to let the computer combined with artificial intelligence technology to automatically understand the content of the video and the expression of the signer.This technology can not only save the labor and time cost of video review and filtering in the background of major video sites,but also facilitate the communication between deaf and hearing people.In this paper,we combine the techniques of both computer vision and natural language processing to develop a deep model for video subtitle generation,and the specific research work and innovation points are as follows.1)A novel depth model for video caption generation is proposed.The spatial information of the video is extracted using a spatial embedding module,and a bidirectional gating loop module and a depth residual stacking gating loop layer are implemented to further encode the spatio-temporal features of the video,and a decoder is able to effectively identify and generate subtitle text from the spatio-temporal features of the video.For the sign language video content understanding task then combined with Transformer language model,the conversion from continuous sign language recognition Gloss sequence to natural language description of continuous sign language translation is performed.2)Further,in order to better measure the relationship between video visual features and subtitle text semantics and to close the modal distance between them,a video temporal attentional hierarchical fusion mechanism is proposed,based on which a video subtitle generation depth model combining spatially embedded features of low-level video content association semantics and high-level abstract features can generate the final subtitle text more accurately.3)The above deep model for video subtitle generation is performed in a series of comparative experiments on the first large continuous Chinese sign language translation and recognition video dataset with complex life context,Tslrt,and a self-built video audit dataset,Audit V.To further validate the effectiveness of the proposed model,experiments are conducted on another large continuous Chinese sign language recognition dataset Chinese-CSL and compared with other published methods,and the results show that the method in this paper can achieve the best results.

Keywords/Search Tags:

Video content understanding, Deep learning, Attention mechanism, Caption generation

PDF Full Text Request

Related items

1	Research On Image Caption Generation Model Based On Attention Mechanism
2	Research On Image Caption Generation Method Based On Deep Learning
3	Research On Image Description Generation Algorithm Based On Attention Mechanism
4	Research And Application Of Image Description Generation Algorithm Based On Deep Learning
5	Research On Image Caption Generation Method Based On Deep Learning
6	Research On Image Caption Generation Based On Deep Learning
7	Research And Implementation On Natural Scene Image Caption Based On Deep Learning
8	Research And Implementation Of Key Technologies Of Image Caption Based On Deep Learning
9	Research On Image Caption Based On Deep Learning
10	Image Chinese Caption Generation Method Based On Attention Mechanism