With the continuous aging of the global population,the frequency of falling events in the elderly population is gradually increasing,posing a serious threat to the health and safety of the elderly.Therefore,solving the problem of indoor falls has become an urgent issue for elderly health management and social care.Currently,wearable sensor devices and computer vision methods are mainly used in the field of indoor fall detection.However,the limitations of wearable sensor devices and computer vision in fall detection include the impact of sensor location and wearing methods on detection accuracy,the restrictions of environmental factors on computer vision algorithm effects,the risks of data privacy and security,and the limitations of applicability to specific populations.To some extent,this article uses sound signals and deep learning algorithms to propose a network model based on residual networks for identifying indoor fall sound events,in order to solve the limitations.The main work and conclusions of this article are as follows:(1)This paper first read a large number of domestic and foreign literature on using sound signals to identify falls,understanding the development,status,and identification methods of the field of indoor fall detection.Then,by analyzing the time-frequency spectrogram of fall sound signals,the Mel frequency cepstral coefficients(MFCC)and their first-order differential coefficients and second-order differential coefficients were selected as the acoustic features of this paper.Supplementing the MFCC from a dynamic time dimension can improve the robustness and noise resistance of the features to some extent.Finally,by analyzing the comparative experimental results of different models,the residual network model with the best performance was selected as the baseline model for indoor fall sound event detection.(2)Through experiments,it has been found that traditional residual networks have good performance,but there are still deficiencies in the field of fall detection.Therefore,this paper proposes an improvement to the residual network using feature pyramid network(FPN),dynamic region convolution(DRConv)technique,and Sim AM attention module.A network training strategy was used to improve the feature extraction capability and computational efficiency of the network,including learning rate decay and pretraining the improved residual network on the Image Net dataset.Experiments were conducted on the Google Audioset dataset and A3 FALL dataset(including human fall sounds and 13 types of object falling sounds).The results show that the proposed improved model performs the best.On the A3 FALL dataset,the average F1-Score reached 96.3%,and for human fall sound events,the F1-Score reached 96.5%,which outperforms other traditional network models.Finally,compared with the methods proposed by Piczak,Sang,and Su,the performance of the proposed method is better on the Google Audioset dataset,with F1-Scores respectively 5.2%,7.4%,and 1.3% higher than theirs.further verifying the performance advantages and generalization ability of the model proposed in this paper.The research method of this paper has certain reference value for the recognition of indoor fall sound events.This method uses sound signals for indoor fall detection,providing another way and technical means for the field of indoor fall recognition,with certain universality and research value. |