| In recent years,behavior recognition based on video and behavior recognition based on bone sequence are popular behavior recognition methods.They have their own advantages and disadvantages.First,the form of data is different:video-based behavior recognition uses video data,while skeletal sequence-based behavior recognition uses bone information obtained from sensors.Second,different robustness:video data is affected by factors such as illumination,occlusion,wearing changes and environmental changes,so it is relatively poor in terms of robustness,while the latter performs better in terms of robustness because it is not affected by these factors.Third,the difficulty of data processing is different:the former requires complex data processing,while the latter can obtain bone information directly from the sensor.In general,the behavior recognition based on bone sequence has better robustness than that based on video.Moreover,with the vigorous development of various pose estimation algorithms and the progress of camera acquisition technology,the behavior recognition method of bone has become the best research idea at present.However,in the existing behavior recognition tasks,the sample needs to be re-calibrated due to scene change(in human behavior recognition based on skeletal data,it usually refers to camera Angle or test object change),and the cost rises.Therefore,the model may be insufficient in recognizing the test samples with differences between perspective and test object.As a result,the behavior recognition method based on bone data is difficult to achieve satisfactory recognition accuracy for some test sets which are different from the training data perspective or test objects.Secondly,there are some action categories that lack data samples in some scenes due to the small amount of data,or the difficulty of collection or the high cost of collection.As a result,the graph convolutional network cannot establish an effective recognition ability for the related scene data of these types of actions,resulting in the decline of recognition accuracy.The research content of this paper aims at the above problems,and the main works are as follows:(1)Under the conditions of existing spatio-temporal graph convolutional network models and data,this paper conducted sufficient experiments on spatio-temporal graph convolutional network by repartitioning data sets,and explored its recognition accuracy of test samples across perspectives and objects under different situations,providing important data reference for the adaptation method of graph convolutional depth domain proposed in the following chapters.(2)This paper studies the transfer performance of spatio-temporal graph convolutional networks.Through the experiment analysis,when the graph convolutional network performs cross-scene recognition tasks,the superficial features are the general features such as node coordinates and node movement trajectory,and the deeper features are the specific features belonging to different scenes,so as to infer the migration feasibility of different layers of the space-time graph convolutional network.It provides an important basis for the realization of the adaptation method of graph convolution depth domain.(3)This paper presents an adaptation method for depth domain of graph convolution.According to the results of analyzing the migration performance of spatio-temporal graph convolutional networks,a feature domain adaptation layer which can calculate the interdomain confusion loss is added after the graph convolutional layer with scene-specific attributes in the execution of cross-scene recognition tasks.The maximum mean distance between the features of source domain data and target domain data is used as the interdomain confusion loss.At the same time,the confusion loss of source domain and target domain and the classification loss of source domain are minimized to efficiently classify unlabeled samples in target domain.Finally,for the second problem,an adaptation method of graph convolution zero sample domain is proposed,and the ability of spatio-temporal graph convolution network to recognize relevant test samples without training data is preliminarily established.In conclusion,the depth domain adaptation method can improve the accuracy of cross-view and cross-subject recognition of spatio-temporal graph convolution model,and further improve the performance of such algorithms.This paper aims at the problems of over-reliance on samples and insufficient recognition accuracy for behavior recognition of bone data in tasks facing cross-domain action recognition,and studies more effective methods to improve the performance of network recognition. |