Font Size: a A A

Research On Scene Structured Description Method Based On Deep Learning

Posted on:2021-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:N FuFull Text:PDF
GTID:2428330614465691Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Scene structured description technology is an important technology derived from the background of artificial intelligence and big data.The structured description of scenes based on deep learning has important application research value.The technology is to automatically describe the key information in the video scene in the form of tags,so that the computer can understand the video content and store meaningful information.At present,people's research on the structured description of scenes mainly stays at the level of natural language processing,and realizes the information interaction between people and computers through the machine translation of video.However,due to the unstructured nature of video,the complexity of video content information,and the uncertainty of video scenes,it is not easy to implement scene structured description technology.From the perspective of computer vision,this paper takes the video scene,the structurally described objects in the scene,and the attribute relationship between the described objects as three entry points,and builds three combinations of scene classification,target detection and recognition,and object spatial relationship.The scene structured description method composed of aspects is used to express the unstructured information in the video with the scene structured description technology.The specific research work of the thesis is divided into three parts:(1)A scene classification algorithm based on transfer learning and saliency region extraction is proposed.First,establish a deep convolutional neural network model for scene classification,and use the network model that has been trained with a large scene picture data set to perform parameter migration;integrate different kinds of scene pictures captured from frames in the video into new small sample data Set,perform image preprocessing on the data set used for training,and use sliding windows to extract saliency regions;finally,train the model to minimize the loss function of the custom softmax classifier to achieve scene classification.The experimental results show that the algorithm can effectively deal with the training overfitting problem caused by insufficient training samples,and obtain a good classification accuracy for environments with a single scene composition factor and few interference factors;at the same time,when there are many interference scenarios,Which improves classification accuracy by about 7%.(2)A traffic scene target detection and recognition algorithm based on lightweight network is proposed.Based on the realization of scene classification,traffic scenes are taken as the main research scenes,and in order to meet the needs of scene structured description,the requirements of multi-target detection,real-time,and lightweight networks are proposed.First,based on the YOLOv3 algorithm network model,by replacing the backbone network,adjusting the multi-scale fusion network,and designing a new loss function,the construction of the YOLOv3-Mobile Net V2 lightweight network model is completed;second,the description object in the traffic scene is determined,Collect relevant pictures and do a good job of labeling,train the YOLOv3-Mobile Net V2 network model,minimize the loss function,and complete the detection and recognition of targets in traffic scenarios.Experiments show that the algorithm can specifically identify the description objects in the traffic scene;secondly,on the basis of completing the requirements of the lightweight network model,realtime multi-target detection and recognition is realized.(3)A structured description method of traffic scene based on object spatial relationship is proposed.Based on the realization of the structured description of the scene and the determination of the description object,the structural description of the traffic scene is completed by studying the spatial position relationship between the description objects.First,train the fully convolutional network to estimate the depth of the objects in the video image,extract the depth map,and use the depth map to convert the three-dimensional point cloud,combined with the object area detected and optimized by the YOLOv3-Mobile Net V2 algorithm,and obtain the target Three-dimensional positioning;Second,through the conversion of the spatial coordinate system and the pixel coordinate system,the spatial positional relationship between the objects is used to describe the scene structure.In the study of the spatial positional relationship,the method of logical description is introduced,the spatial positional relationship between the description objects in the traffic scene is established in a logical language,and the structured description of the scene is realized by the spatial positional relationship.Experiments show that the combination of depth information and the two-dimensional target position detected by the YOLOv3-Mobile Net V2 algorithm can locate the position of the target in three-dimensional space,which provides great convenience for describing the location of the spatial position between objects.At the same time,the structure of the traffic scene can be described by describing the spatial position relationship between the objects.
Keywords/Search Tags:Scene structured description, Computer vision, Scene classification, Target detection and recognition, Object spatial relationship
PDF Full Text Request
Related items