Font Size: a A A

Research On Key Techniques Of Video Semantic Understanding Based On Dynamic Scene Understanding

Posted on:2021-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:C YangFull Text:PDF
GTID:2428330620464181Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of Internet technology and the collaborative improvement of related software and hardware,the society has gradually entered the era of big data.The amount of multimedia data,mainly video,is increasing day by day,and it is full of people's lives.Various video data has gradually become the main source of information for people.On the one hand,the rich information resources have brought great convenience to people's lives and satisfied people's spiritual needs.On the other hand,the huge amount of video data has brought great difficulties to people in obtaining accurate information,and it also has brought great challenges to the supervision.How to use computer to understand video accurately and provide effective reference for classification,retrieval and other tasks is a very challenging thing.There are various methods of video understanding.Among them,the extraction and analysis of video data from the semantic layer is an effective method for video understanding tasks.There are many ways to understand video.This thesis studies video description by using natural language.From the perspective of semantic understanding,this thesis studies image semantic understanding and video semantic understanding separately.The main work is as follows:1.A two-layer LSTM network is proposed and designed to solve two problems about image caption in the traditional Encoder-Decoder.One is ignoring the impact of Encoder performance improvements,the other is ignoring the correlation between images and text descriptions.In this thesis,multiscale image pyramids are used to enhance the ability of extracting semantic information at Encoder,inner LSTM is used to filter the feature information extracted from convolution network,and information gain is used to narrow the feature distribution between image and text,which solves the problem that the distribution of image features and text descriptions is not aligned in other methods.Through several comparative experiments,the method presented in this thesis shows that it has good performance in image caption.2.This thesis proposes and designs a recurrent graph convolutional network combined with a control gate structure.It can solve two problems.One is the lack of semantic information of features in traditional models.The other is the misalignment of sequence features commomly found in sequence to sequence tasks.Ours model uses scene graph,which is a kind of image semantic description model,as input data,and adds a self-loop structure with control gate to the traditional graph convolutional network.The control gate is used for feature selection,and the loop structure is used for information and weights sharing.By using control gate and loop structure,our model can enhance or attenuate the flow of feature information in the network.Therefore,the correlation between video sequence and text sequence is continuously enhanced.Experiments verify the effectiveness of the proposed algorithm.3.Based on the above research work,this thesis designs and implements a webbased visual description system prototype.With a simple implementation,users can use the visual description model implemented in this thesis on the website.
Keywords/Search Tags:deep learning, image semantic understanding, video semantic understanding, recurrent graph convolutional network
PDF Full Text Request
Related items