| The development of Internet technology has brought great convenience to video dissemination,and the emergence of video platforms such as Tik Tok and Bilibili has made the number of videos grow dramatically.Faced with the huge amount and complex structure of video data,how to efficiently retrieve the content users need has become a hot spot and difficult point of research in the field of retrieval,and traditional text-based video retrieval methods can no longer meet the growing needs of people.Therefore,the content-based video retrieval method has emerged and been widely used,and this paper mainly improves the video feature extraction part of the method,mainly doing the following work:First,video data carries richer content compared to text and images,and video can be understood as a collection of video frames arranged in a certain temporal relationship.Therefore,the retrieval of video data also becomes more difficult compared to images,and most current video retrieval methods do not learn the temporal relationships between video frames.To address this problem,this paper uses a combination of convolutional neural network and bidirectional LSTM network to extract video feature information.The traditional one-way LSTM network can extract the temporal information of the video but is not comprehensive enough,and by using a bidirectional LSTM network instead,we can obtain more complete feature information in the past and later.By using this network,not only the spatial features of the video can be obtained,but also the timing information between the video frames can be extracted,which makes the video content fully expressed.Second,for video,different video frames have different roles for video content expression,but most of the current video retrieval methods do not distinguish this,which will bring a lot of redundant information to video feature extraction,and then have an impact on the efficiency and accuracy of video retrieval.In this paper,we use an attention-based video retrieval method based on the Res Net50 network,in which the SE module is embedded to realize the weighting calculation of different channel features,so that the content that can promote video expression can get more weight,while reducing the weight of the content that has less effect on video expression,thus reducing the redundant information in the feature information and further improving the expression ability of video features.This reduces the redundant information in the feature information and further improves the video feature expression.Finally,relevant experiments are conducted on public datasets using the method of this paper,and the effect is improved compared with previous video retrieval methods,and a video retrieval system based on this method is designed and implemented,which proves the feasibility and practicality of this research. |