The aging of population has become a worldwide issue. Nowadays, more and more elderly people are living alone. When falling to the ground, the elderly will get seriously injured. Falling down becomes a major cause of injury for the elderly. With the rapid development of smart home applications, computer technology, sensor technology and image information processing technology, automatic fall detection becomes an emerging technique to be explored. The human fall detection based on computer vision technology guarantees the elderly security, reduces the burden of their children, and couldn’t inconvenience the elderly’s daily life. Human fall detection could effectively reduce the possibility of delaying treatment and death. It could also reduce the labor costs of house care for the elderly, and improve the quality of their life.In this thesis, a multimodal feature fusion for human fall detection algorithm is proposed, the major work is as follows::Firstly, the thesis discusses the background and significance of human fall detection, and reviews the current status of human fall detection technology. We analyze the current problems from two aspects:feature extraction and classification. The main work and framework of this paper is shown in this part.Secondly, a new human fall detection dataset SDUFall is recorded and released based on Kinect camera. The dataset contains color image sequences, depth image sequences and three-dimensional coordinates of the point of human skeletons.Thirdly, the algorithm of feature extraction based on depth images and skeletal point coordinates information is studied on human fall detection. Curvature scale space (CSS) features and morphological characteristics of human body from depth image are extracted. For each action, the trajectories of the skeleton points are provided by Kinect SDK. Curvature scale space feature is robust to translation, rotation, scaling and local deformation. By filling the extracted shape silhouette, a foreground human region whose properties might uniquely characterize the depicted action is obtained. The human trajectory characteristics make up time information that is missing in the other two characteristics.Fourthly, a multi-modal feature selection and fusion algorithm based on the group-sparse non-negative supervised canonical correlation analysis (GNCCA) is proposed. GNCCA aims to integrate infinite views of high-dimensional data to provide more amenable data representations for action classification. It guarantees positive correlations of the selected features. The distances between different classes are maximum and the noise is minimum. The group-sparse constraint that allows for simultaneous between- and within- view feature selection. In particular, GNCCA is designed to emphasize correlations between feature views and class labels such that the selected features guarantee better class division. First, the morphological characteristics of each frame based on the depth images and the trajectory characteristics of each time point are fused on the feature level fusion. Second, the result of feature level fusion is fused with CSS feature on the decision level fusion. The final results of classification are obtained.Fifthly, an Improved Variable-length Particle Swarm Optimization (IVPSO) algorithm is proposed to automatically select the optimal structure of Extreme Learning Machine (ELM) classifier (the number of hidden neurons with the corresponding input weights and hidden biases) for maximizing the accuracy of validation data and minimizing the norm of output weights. IVPSO-ELM can solve several issues like local minima, slow learning rate and stopping criterion.Finally, the conclusions are presented with recommendation for future work. |