Font Size: a A A

Research On The Key Technology Of Deep Learning Based Action Recognition And Tourism Scene Classification

Posted on:2018-05-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:T Q QiFull Text:PDF
GTID:1318330566454660Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet and multimedia technology,which has significantly influenced every aspect of human life,and results in a large amount of visual data every day.Explosive growth of visual data,which boosts the performance of a lot of research toptics in computer vision,including object recognition,action recognition,scene classification and understanding,and image segmentation.Moreover,each topic has its own specific applications.Like action recognition,it has a wide of applications,including human computer interaction,surveillance and key-frame extraction.In addition,each topic contains the specific high-level clues,such as human body pose,human body parts,object and context or scene for image-based action recognition.Therefore,in computer vsion,how effectively to use these high-level clues to design our model,which is an important research problem.This dissertation focuses on two research topics in computer vison: one is image-based action recognition,the other is tourism scene classification.For these two computer vision tasks,we proposed image-based action recognition and tourism scene classification method with high-level clues,respectively.The first work is to do image-based action recognition using action knowledge prototypes;the second one deals with action recognition with hint-based neural network,and the third one addresses tourism scene classification based multi-stage transfer learning model.In the first two works,we deal with image-based action recognition by considering high-level cues including human pose and human body parts.And in the third one,we use a high-level clue,category hierchical structure,to design our scene model.Convolutional neural network is used by our second and third work.Meanwhile,the first work applies multi-stage convolution and pooling operation,which can be view as a simplified neural network.The mian contributions of this dissertation are listed as follows:1 We proposed an image-based action recognition method using action knowledge prototypes.We use the high-level clue,human pose,to encode actions in the still images.Given an image,we firstly eatract globale features through multi-stage of convolution,pooling and nonlinear operations,and then use these global features to encode the action image with BoF framework.Finally,given a test image,we predicted a label by a trained multi-class linear SVM classifier.The results demonstrate the effectiveness of our proposed multi-stage feature extraction framework.2 While human action recognition from still images has wide applications in computer vision,it remains a very challenging problem.Compared with video-based ones,image-based action representation and recognition are impossible to access the motion cues of actions,which largely increase the difficulties in dealing with pose variances and cluttered backgrounds.Motivated by the recent success of convolutional neural networks(CNN)in learning discriminative features from objects in the presence of variations and backgrounds,in this paper,we investigate the potentials of CNN in image-based action recognition.A new action recognition method is proposed by implicitly integrating pose inference into the CNN framework,i.e.,we use a CNN originally learned for object recognition as a base network and then transfer it for action recognition by jointly training the base network with a pose hint task.Such a joint training scheme can guide the network towards pose inference and meanwhile prevent the unrelated knowledge inherited from the base network.For further performance improvement,the training data is augmented by enriching the pose-related samples.The experimental results on three benchmark datasets have demonstrated the effectiveness of our method.3 Many researchers focus on scene classification in computer vision,that is because it is an important problem.As our known,tourism scene classification has not been paid attention to in computer vision.In this paper,we build a new scenic-spots-centric database called tourism scene,which consists of 25 tourism scenic area with 750 tourism scene categories,about 440 thousand labeled images.For tourism scene classification,we propose a multi-stage transfer learning model with category hierarchical structure,and use convolutional neural networks(e.g.Alex Net)as basic building block.To demonstrate the effectiveness of our proposed model,we also propose a baseline model and single-stage transfer learning model.From the results,we observe that our proposed mehtod achieves new bounds for performance.
Keywords/Search Tags:action recognition, tourism scene, neural network, hint, transfer learning
PDF Full Text Request
Related items