| Human pose estimation and action recognition is an active research field of computer vision,and has significant application value in the fields of elderly care,medical rehabilitation,anime game production,sports medical research,security monitor and factory man-machine collaboration.Human skeleton coordinate representation is a high level representation for human motion,which can effectively overcome the effect of complex background,lightening and human body shape compared to color and depth image.Thus human pose estimation and human action recognition based on skeleton data received more and more attention.Kinect,a sensor from Microsoft,makes it possible to estimate the human pose from a low-cost depth camera.However,estimate human pose from single view has the problem of self-occlusion and mutual occlusion,which cause skeleton data of poor quality and false detective.Besides,most of current human action recognition research mainly focus on offline action recognition,which have slow inference speed and can't meet the need of real-time application.In order to solve the above mentioned problems,we constructed a distributed three dimensional vision sensor network,and the multi-view skeleton joint coordinate data is fused using ICF algorithm,which is classified by the SVM model trained using one-versus-all algorithm.In the offline action recognition task,we construct joint set distance feature,geometric feature and motion feature,and trained the classification model with multiple channel feature representation using one dimensional temporal convolutional network.In the task of designing a real-time human action recognition system,actions are recognized using the sampling strategy based on memory group and the trained CNN classifier.The algorithm is deployed on the server as a service for invocation.The specific research contents and innovations include the following parts.Firstly,we construct a distributed three dimensional vision sensor network,which solved the problem of occlusion in human pose estimation and increased the human action recognition accuracy.The distributed three dimensional vision sensor network is composed of four Kinect,which covers about 180 degree field of view.We use the ICF algorithm to fuse human skeleton joint coordinate from different views and get higher accuracy pose estimation.We use dynamic time warping algorithm to encode the raw data and model the temporal information based on fourier temporal pyramid feature representation.Then we use linear support vector machine to classify the data.We compare the result of data from four different view and the fused data and validate the effectiveness of the algorithm.Secondly;we designed an action classification model based on feature representation driven method and one dimensional convolutional network,which increased the offline action classification accuracy.In order to better represent human action data,we use joint set distance feature to provide invariance to distance and perspective.Also we use geometric feature to provide geometric information between different joint in the same frame.And motion feature of different scale are used to provide information on the movement rate of joints between different frames.Considering the superiority of one dimensional temporal convolutional network in dealing with time series problems,we use one dimensional temporal convolutional network to model different feature series and fuse different embedded feature to classify the data.We experiment on the benchmark JHMDB and UTKinect.The results show that the proposed method has advantages in recognition accuracy and inference speed.Thirdly,the sampling strategy based on memory set,combined with the trained CNN classifier and real-time recognition strategy,is used to do online action recognition.Considering the limited computing power of data collection nodes,and at the same time,in order to improve the application range of online action recognition algorithms,we proposed to use client/server architecture based on Restful style.The online action recognition algorithm is deployed on a remote high-computing performance server as a general service for invocation.We experiment on the benchmark UTKinect and the dataset collected in the laboratory environment and validate the effectiveness of the algorithm. |