Font Size: a A A

Design And Implementation Of Video Detection And Retrieval System

Posted on:2022-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:R T HeFull Text:PDF
GTID:2518306605989619Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the self-media industry and 5G communication technology,the methods of how to make a video become simpler,and the transmission of videos becomes faster and wider.In the context of the rapid popularization of smart phones,an increasing number of online video users also play the role of videos' producers,so the sources of videos have become more extensive,and this has resulted in extremely large amounts of video datas.The explosion of videos made it difficult for users to choose a video and determine whether it contains content of interest.Therefore,how we efficiently obtain the required effective information from videos needs to be studied.Through the analysis of various methods of video information detection,this thesis found that the methods of detecting video based on artificially set shallow feature extraction have common problems,which is,shallow features lack information such as space,color,etc.,and cannot express the problem.The high-level semantic information of the detection object.The CNN(Convolutional NeuralNetwork)model has made breakthroughs in large-scale image recognition and voice recognition.The high-level features that can be abstracted from pictures have relatively good generalization,so as to achieve high-precision recognition and classification.Therefore,it will be more accurate and efficient to use the CNN models to detect videos and obtain the effective information.By comparing kinds of CNN models,it is proposed to design and implement a video detection and retrieval system,which can detect objects,faces,texts,scenes and other content in the video,and can compare the video based on the pictures provided by the user.To search for the content.The system first uses the FFmpeg audio and video tool to achieve the acquisition of video frames and audio in the video,and the PySceneDetect video segmentation tools implements a content-based method to extract key frames.After extracting several key frames of a video,the target detection model ResNet50 is used to extract the features of the key frames and generate an index containing all the extracted key frame features,that is,the key frames feature set.The target detection model SSD-Mobilenet used in video detection has an average accuracy of 94%,and the average accuracy of SSD-Inception has reached 97%.The face detection model MTCNN is used to detect faces of video,and the text detection model CTPN To realize the detection of text of video,the audio scene detection model SoundNet realizes the detection of the scene of video.The realization of the video retrieval function is mainly through the user inputting the picture,after the ResNet50 extracts the picture feature,the feature matching is performed through the feature set vector index of video key frames reconstructed by Faiss,so realize the retrieval of video content.In the implementation of video detection and retrieval system,this thesis has made detailed design and implementation of multiple modules such as system status monitoring,video detection,model management,user management,and video retrieval,and performed a functional and performance analysis of the system.test.The test results show that the system can successfully detect video content such as targets,faces,texts,and scenes,and can retrieve video content based on pictures provided by users.The functions of the various modules of the video detection and retrieval system are normal,and the performance requirements have also met expectations.The video detection in this article uses a trained model.The next step is to optimize the model.This work is the future direction of the thesis.
Keywords/Search Tags:CNN, video detection, key frames extraction, Faiss retrieval
PDF Full Text Request
Related items