| Combining artificial intelligence with human action behavior for analysis makes human behavior intelligence gradually become the focus of society,and has been widely used and developed in human-machine interactive communication and intelligent monitoring,as well as troop training and sports programs.In addition,deep learning can better handle various input features and has demonstrated excellent performance in human action detection.Deep learning excels at learning in stages after inputting images through the input layer,and its large data availability and computational power make it a key tool for human action recognition.Due to the spatial environment in which the human body is in when the action occurs,the intensity of light as well as complex backgrounds and different angles will bring some influence to the detection,while the starting action interval of the human body emitting the action is different and not easy to capture small actions,the action detection still faces great challenges.Therefore,it is of great research value and significance to study a more suitable model for human behavior detection in order to enhance the detection of small human movements.In this thesis,human behavior recognition is fully investigated,the difficulties of spatial complexity and temporal variability faced in behavior detection are specifically analyzed,a Vector Tracking algorithm(VT Algorithm)is proposed,and the proposed algorithm is also encapsulated to propose a new Cross-dimensional Feature Fusion Multilayer Perceptron(CFF-MLP)detection model.The main work of this thesis is as follows.The vector tracking algorithm is proposed,which is an Optical Flow-based algorithm that facilitates the effective capture of continuous behavioral features.The algorithm performs feature extraction with time series based on the angle values and distances between the coordinates of key points of the skeleton,and completes the calculation of the aperture problem existing in Optical Flow.At the same time,Time Series is introduced in the algorithm to capture the continuous changes of human action on video frames and transform the human key points into a continuous change matrix on video K frames.A new behavior detection model,CFF-MLP,is proposed.The CFF-MLP behavior detection model consists of three parts: VT Stage,ASMLP Stage,and CFF-based Stage.CFF-MLP adopts a cross-dimensional feature fusion design approach,which can hierarchically acquire behavioral features and local dependencies at different levels and enhance the model generalization capability.In the model,axial features are used for information capture to achieve local receiver domain dependencies,and features are extracted by way of channel projection to construct features at each scale in a bottomup hierarchy with lateral connections.The effectiveness of the proposed model is verified by experimental analysis,and the usefulness and advantages of the model in this paper are verified by combining the comparative analysis of other methods. |