| Video captioning aims to generate a natural language description of the main content of a given video.It has become one of the research hotspots recently because of its potential application value in many fields.This thesis focuses on the application of semantic information in the video captioning models,from the perspectives of visual-attribute semantic enhancement,attribute semantic expansion and attribute semantic transitivity,and a prototype system is developed based on the research content.The main research work of this thesis is as follows:(1)A video captioning method based on visual-attribute semantic enhancement is proposed:Aiming at the problem of wrong description words caused by insufficient participation of visual features in the process of existing model decoding,an interactive fusion mechanism of visual features and text features is introduced,and the fusion features are combined with attribute semantic features to improve the content of visual information in the input data.Meanwhile,the recurrent dropout method is introduced to alleviate the overfitting problem in the model.The experimental results show that this method can guide the model to generate more accurate words,relieve the over-fitting situation,and greatly improve the model performance.(2)A video captioning method based on attribute semantic extension is proposed:In view of the small range of attribute semantics used by existing models,which leads to insufficient coverage of video attributes,external knowledge is introduced with the help of knowledge graphs,in order to expand the scope of attribute semantics,the related words with high relevance degree of existing attribute semantic words are selecte.At the same time,higher quality attribute semantic features are obtained by improving the visual feature fusion method.Experimental results show that the proposed method can make the model refer to a wider range of attribute semantic information in the decoding process,and greatly improve the diversity of description sentences and model performance.(3)A video captioning prototype system based on attribute semantic transitivity method is constructed:This method exploits the transitivity of the relevance between attribute semantic words,and takes the high relevance related words of existing attribute semantic words as the benchmark to obtain their high relevance words,so as to further expand the attribute semantic coverage.Experimental results show that this system can introduce more attribute semantic words and further improve the performance of the model.The video captioning prototype system allows users to upload videos and display the analysis results after parsing them,which verifies the effectiveness and practicability of the method.In conclusion,the thesis explores the effects of increasing the content of visual information in the decoder input data and introducing more attribute semantic words on the video captioning model.A large number of experiments on common datasets verify that the proposed method can improve the performance of the model and the diversity of description statements. |