Font Size: a A A

Research On Action Perception Method For Service Robot Based On Local Spatiotemporal Features

Posted on:2020-05-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:G L ZhangFull Text:PDF
GTID:1368330623456317Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The understanding and perception for the actions of service object by family service robot are the core problems of human-robot interaction technology that need to be solved urgently,and they are also the important embody of its autonomy and intelligence level.With the development of the research on three-dimensional spatiotemporal modeling,more and more researchers have paid attention to human action recognition based on local spatiotemporal features.Human eye perception is a process in which visual stimulus information is transmitted in two directions through multiple levels of cooperation.The robot cannot have reliable human-like recognition ability by only a single image attribute,and it is necessary to further optimize the description mechanism and fusion strategy of various features.It can be seen that human action perception for the robot is a more complex computer cognitive technology based on the three-dimensional spatiotemporal modeling and multi-feature fusion strategy,and accurate action perception is a prerequisite for robotic autonomous mission planning.This dissertation is oriented to the action-perception problem of home service robots and carries out related researches around local spatiotemporal features.Firstly,the action recognition method based on adaptive mutation particle swarm optimization for support vector machine(SVM)is proposed by the space-time interest points(STIP)sampling strategy.Secondly,through the image multi-scale dense sampling strategy,the insufficiencies of trajectory features in filtering out interference from background noise,describing the structure of human,and interactive semantic information are discussed in depth,and the corresponding improvement schemes are proposed.Finally,the overall framework of multi-robot service system is designed based on human action perception technology,and its effectiveness and feasibility are verified by actual experiments.The main works of this dissertation are as follows:(1)To improve the ability of the algorithm to recognize human actions in video sequences,an action recognition framework based on local spatiotemporal features is established.Firstly,the feature descriptors composed of histograms of oriented gradients(HOG)and histograms of optical flow(HOF)are used to describe STIPs,which are achieved by the Harris3 D detector,and then encoded by Fisher vector(FV).Because the generalization ability of the SVM model for action classification under traditional parameter setting strategy is insufficient,the particle swarm optimization algorithm is applied to the parameter optimization of each action classifier.According to the characteristics of population diversity changed from generation to generation,the constructed particles aggregation degree model is used to adjust mutation probability for each generation of particles dynamically.Finally,the proposed method is verified on the KTH and HMDB51 datasets.The results show that the adaptive mutation particle swarm optimization(AMPSO)algorithm has a good convergence performance,and the overall recognition framework has high practicability and accuracy.(2)To solve the problem that the background trajectory cannot be effectively filtered out in the dense trajectory feature,a foreground trajectories extraction method based on multi-scale hybrid masks is proposed.Firstly,the motion boundary images of each frame are calculated by using the optical flow to derive initial masks.According to the characteristics of action videos,image priors and the synchronous updating mechanism based on Cellular Automata are exploited to generate an optimized weak saliency map,which will be integrated with strong saliency map obtained via the multiple kernels boosting(MKB)algorithm.Then,multi-scale hybrid masks are achieved through the collaborative optimization strategy and masks intersection.Finally,the optimization of the foreground trajectory is achieved by designing an effective compensation scheme.Experimental results on the benchmark datasets demonstrate that the proposed method can extract the foreground trajectories closely related to the moving subject and significantly improve the discriminative performance of the original trajectory features.(3)To solve the problem of the insufficient description on a human structure by dense trajectory features,and to effectively utilize the difference in sensitivities of different handcrafted features to specific actions,an extendible and universal weighted score-level feature fusion method using the Dempster-Shafer(DS)evidence theory based on the pipeline of bag-of-visual-words(BoVW)is proposed.Firstly,the partially distinctive samples in the training set are selected to construct the validation set.Then,local spatiotemporal features and pose features are extracted from these samples to obtain evidence information.The DS evidence theory and the proposed rule of survival of the fittest are employed to achieve evidence combination and calculate optimal weight vectors of every feature type belonging to each action class.Finally,the action labels are deduced via the weighted summation strategy.The experimental results demonstrate that the proposed feature fusion method can adequately exploit the complementarity between multiple features and improve the action recognition accuracy of the algorithm.(4)To solve the problem that the local spatiotemporal features based on BoVW framework cannot effectively encode interactive semantic information,a spatiotemporal semantic feature(ST-SF)is proposed,which is transformed into auxiliary criterion by information entropy theory for making the correct decision.Firstly,a text-based relevance analysis(TRA)method is presented to estimate the textual labels of objects most relevant to actions,which are employed to train the more targeted object detectors.Secondly,false detections are optimized by the inter-frame cooperativity and dynamic programming to construct the valid tubes.Then,we use ST-SF to encode the interactive semantic information.Finally,the concept and calculation of feature entropy are defined based on the spatial distribution of ST-SFs on the training set,and a two-stage action classification framework using the resulting decision gains is constructed.The testing results on three publicly available datasets demonstrate that the proposed method has significant performance improvement compared with the existing algorithms,and can effectively encode interactive semantic information to achieve robust action recognition in realistic scenes.(5)The monocular vision sensor is used to capture action video clips in the realistic indoor scene.We apply the above action recognition algorithm to the actual service robot system to make it have the capability of robust action perception and can perform some interactive tasks autonomously.Firstly,a multi-robot collaborative service framework for completing interactive tasks is constructed based on sensors network,transfer robot,and intelligent grabbing robot.Secondly,for the daily demands of the users of wheelchair bed,the daily action dataset containing 13 action categories is established,which is used for training the required action classifier.Finally,an interactive interface for multi-robot service system is designed for the action-perception by robot and the human-computer interaction tasks.The feasibility of the proposed framework and the practicability and accuracy of the action recognition algorithm are verified by the related experiments.
Keywords/Search Tags:service robot, human action recognition, local spatiotemporal feature, bag of visual words, DS evidence theory, information entropy
PDF Full Text Request
Related items