| With the advent of the era of big data,deep learning technology has been applied in many fields.In tasks such as object recognition,speech recognition and natural language processing,the accuracy and speed are beyond the traditional methods.Applying computer vision technology to train real examples of daily dynamic scenes gives us the opportunity to meet the needs of daily life.This paper aims at the action recognition of the daily life in homes,combines the methods in recent years and improves it,making the neural network training more rapid and effective.In the database selection,a daily action database,Charades,is used to shift the bias from Internet images to real scenes.In data processing,image cropping and left-right flipping are used for data augmentation.In neural network selection,a novel Two-stream 3D Convolutional Networks is used refer to I3 D.In this network,we transform 2D convolutional network into 3D convolutional network by endowing it with an additional temporal dimension,bootstrap 2D filters to initialize the 3D filters,and the time domain is introduced into the receptive field and the reasonable stride in time domain is attempted.We use two 3D networks to train temporal sequence and spatial sequence separately and averaged their accuracies of classifications,and we also compare it with Two-Stream Convolutional Networks and3 D Convolutional Neural Networks which achieves state-of-art results currently.For parameter initialization,the inflated parameter of 2D ImageNet model is used and Kinetics is used for pre-training in different ways,and for effect comparison.During training,the BN method is used to increase the training speed of the network and improve the classification accuracy after convergence.Finally,we use different resolutions of image as inputs to train network and get the suitable image resolution value for action recognition.The experimental results show that the method used in this paper is fast and effective. |