Sound signals are an important source of information in human life,and sound event detection aims to identify and classify the sound events in life and thus determine what events are likely to occur.In the process of sound event detection,different sound events usually overlap,which is obviously more in line with the actual scenario and polyphonic sound event detection has a wide range of application prospects in the fields of security monitoring,intelligent transportation,smart home and smart city construction.At present,polyphonic sound event detection algorithms are mainly classification algorithms based on multilayer neural networks,which are also the most used classification algorithms at present,but these neural networks can have many problems in the process of dealing with overlapping sound events.Firstly,the number of audio overlaps between polyphonic sound events and the degree of overlap are unknown,resulting in poor modelling of the spatial location of the sound events;secondly,for a single acoustic feature,which carries little acoustic information,it is difficult to completely describe all the features of the sound event;finally,the problem of losing a large amount of pose information after the feature information has been convolutional neural network,resulting in the model not being able to accurately locate the start and end moments of the sound event,while for the commonly used convolutional recurrent neural network model,its complex structure and more parameters lead to delayed recognition results and poor real-time performance.In this paper,the following work is carried out to address the above problems.(1)To address the problem of polyphonic sound events,a capsule network(Caps Net)is used to improve the convolutional neural network model,replacing the scalar neurons in the convolutional neural network with capsule vectors as the basic representation unit,and using capsule vectors to model acoustic events at multiple scales,which can retain more information about the location of acoustic events.To address the problems of complex models with many parameters,a deep separable capsule network(DSC-Caps Net)detection model is constructed,which greatly simplifies the model while ensuring performance.Finally,experimental analyses were conducted on a public dataset,and the results showed that the proposed DSC-Caps Net detection system improved the performance by about 12%over the baseline system.(2)To address the problem of single features carrying little information,this paper aggregates different kinds of features,uses fused features as input features,and explores the performance improvement brought by different feature aggregation through extensive experiments.Different acoustic features can describe sound events from different perspectives,making them more distinguishable and thus improving the recognition performance of the model,a baseline model based on a convolutional recurrent neural network was also constructed for comparison experiments.(3)To address the problems of loss of feature information in convolutional neural networks and the inability to introduce contextual information for the detection process,a Caps Net-RNN polyphonic sound detection model was built to overcome the problem of loss of pose information in convolutional neural networks through capsule vectors in capsule networks,and finally recurrent neural networks introduced contextual information for the detection task to provide more accurate start and end times of sound events prediction.From the experimental results,it can be obtained that the model achieves a performance improvement of around 15% compared to the baseline model.Finally,in order to test the performance of the proposed algorithm in a practical scenario,a real-time abnormal sound monitoring system is built in the campus,and the two proposed polyphonic sound event detection algorithms are applied to the monitoring system,and practical problems such as insufficient power supply for outdoor power supply are solved,and finally experiments are conducted in a practical scenario to verify the performance of the proposed algorithm. |