Font Size: a A A

Research On Several Problems Of Video Semantic Analysis

Posted on:2019-07-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:R LiangFull Text:PDF
GTID:1318330569487467Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The key of video semantic analysis is to bridge the “semantic gap” between the low-level visual features and high-level semantics.Although a large number of theories and algorithms have been proposed in this field,due to the complexity of data,there are still many problems such as data redundancy,low recognition rate of semantic objects and behaviors,low computational efficiency of algorithms and low accuracy of semantic description that have not been solved well in this field.Against the research of these problems,the main results of this paper are as follows:(1)Two video dimensionality reduction methods are proposed to solve the problem of redundancy of video.One is based on the CNN feature of frames,which completes the shot segmentation by setting three thresholds,and extracts the key frame of each shot based on the full connected graph,then uses features of key frames of these shots to express a video.The other method firstly determines the position of moving object by the field of optical flow;and then introduces time dimension for each pixel;constructs a bilateral space by a interpolation method;and automatically cuts apart moving object by graph cut;finally,restores the labeled cut results to the original space.These two methods can effectively preserve the key information of the video,and greatly reducing the data size and the computational cost.(2)A detection algorithm based on entropy weight improved HOG and SVM is proposed to solve the problem of pedestrian detection at night.In this algorithm,entropy weight is used to improve the HOG feature and improve SVM by only leave few support vectors to lift the efficiency of the algorithm,so as to accelerate the classification process.According to the height of the ROI region,different SVM have been trained to detect the pedestrian detection in different distance in the video.Also a head detection step is added to improve detection rate.Experimental result shows that this algorithm can effectively improve the accuracy and efficiency of pedestrian detection.(3)Aiming at the problem of semantic behavior recognition,a behavior recognition framework is designed based on a Two-Stream CNN model.Firstly,the knowledge enhanced motion vector obtained through optical flow CNN is used to train the time domain CNN in order to obtain the time domain feature,which can reduce computational complexity.And the key frames extracted by video shot segmentation and key frame extraction algorith is used to train the spatial domain CNN to obtain the spatial feature.Adding a distinguishable representation module which can provide additional label information for the later hidden layer so as to effectively avoid the slow convergence rate caused by the disappearance of the early hidden layer gradient,and the network may converge to a local minimum.The time domain feature and the spatial domain feature are combined to train the recognition model.Experimental result proved that this framework can improve the accuracy and efficiency.(4)Aiming at the problem of video semantic description,a video natural language description method based on multi-feature fusion and LSTM is designed.Several different features,such as spatial feature,time feature,motion feature,etc.are extracted.In the early and later stage of model training,we use splicing,weighting and other methods for feature fusion,and train multiple video natural language description models.Experimental results of different models show that this method can improve the accuracy of video natural language description.
Keywords/Search Tags:Video Semantic Analysis, Shot Boundary Detection and Segmentation, Video Segmentation, Target and Action Recognition, Deep Learning
PDF Full Text Request
Related items