Font Size: a A A

Action Recognition Based On Multi-model Voting With Cross Layer Fusion

Posted on:2019-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y YanFull Text:PDF
GTID:2348330548462304Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,human action recognition in videos is an active field in computer vision.With the rapid development of intelligent video surveillance and robot applications,the research of human recognition in video is now encountering more and more challenges.Aiming at the problem of the loss of action feature information in convolution neural network and the over fitting of models,based on imagenet-caffe-ref network model,combined with horizontal flip method and cross layer fusion model,an average multi-model voting action recognition method is proposed.First,the video data is preprocessed before entering to the convolution neural network model.The idea of rank pooling is used to encode the motion information in the video frame to generate an approximate dynamic image.In order to avoid excessive compression of the video data,the approximate dynamic image is flipped horizontally,which doubles the amount of data to the model training.Secondly,all the approximate dynamic images are input into the convolution neural network model with cross layer fusion by random extraction,to extract the feature and training.As the feature loss occurs when the feature map is transmitted in the network model,after considering all kinds of circumstances,we integrate the second level output feature of the network model with the fifth level output feature to keep the integrity of the feature information transmission process.In addition,in order to increase the feature information of the fully connected layer and further relieve the over fitting of the model,performing horizontal flip over all the feature information before entering fully connected layer.Finally,the average of multi-model voting method is used to study the action recognition.The network model of the same architecture is trained by setting different parameters,and different models can be obtained.With the idea of integration,multi-models are fused and the final output of each action class are calculated by the average of multi-models voting.This method makes each video data through multiple model training,and the final recognition rate is more accurate and reliable.In the experiment part,the UCF101 dataset is used to verify the cross layer fusion model and multi model voting system.First,verify the recognition effect of different fusion weight parameters in simple cross layer fusion model,and select the best effect weight parameters as the basic parameters of the simple fusion model.The framework of simple cross layer fusion model is improved,and two operations are added to the cross layer fusion model by adding preprocessing flipping and feature flipping.Multiple different parameters are trained for many times to train the cross layer fusion model of preprocessing flipping and feature flipping,then several different network models can be obtained.The recognition rate of these different models can be compared to verify the effect of recognition.In the multi-model voting system,design the non-fusion model framework and combined with the cross layer fusion model,we can get the recognition rate by average the multi-model voting.Compared with the existing action recognition method,our method can get the reliability of recognition results and high robust model.The experiments prove that the method in this paper has better accuracy.The preprocessing flipping can increase the amount of training data,and the cross layer fusion method ensures the integrity of the feature transmission process,and feature flip process can increase the useful feature amount.When recognizing the human action,the recognition method of multi-model fusion is more reliable for the action recognition and classification,and the recognition rate is greatly improved.
Keywords/Search Tags:action recognition, approximate dynamic image, horizontal flip, cross layer fusion, average of multi-models voting
PDF Full Text Request
Related items