| Sound event detection has a wide range of application scenarios,such as health monitoring,urban sound analysis,and biomonitoring.Traditional sound event detection research uses small datasets,resulting in weak generalization of sound event detection models.In 2017,Google launched Audio Set,a large audio dataset,and sound event detection research has since shifted from traditional fully labeled sound event detection to weakly labeled sound event detection.Although weakly labeled sound event detection overcomes the difficulties of traditional sound event detection with low data volume,it faces new problems and difficulties.One of the biggest difficulties is to analyze the start and end time of sound events from the data with only categorical labels,and current the performance of sound event detection algorithms is not good enough.Thus,it is a valuable research topic to analyze more information about sound events from weakly labeled data by effectively using Audio Set to train a highly generalizable sound event detection model.Improving the existing sound event detection algorithm is the focus of this thesis.In order to improve the performance of weakly labeled sound event detection algorithms,this thesis focuses on the core part of algorithm-the pooling function.Most of the commonly used aggregation functions use a single decision strategy,which may have the problem of biased decision making.This thesis improves the commonly used pooling function in the field of sound event detection,and proposes an adaptive pooling function and conducts comparison experiments.The adaptive pooling function outperforms other pooling functions in all metrics of sound event classification,has a lower Error Rate than other pooling functions in audio event localization,but a lower F1 value than power pooling function in sound event localization,has the best performance in multiple audio categories,and has the best metric scores in different signal-to-noise ratio environments.The results show that the adaptive pooling function can improve the noise immunity of the system and has higher evaluation scores on multiple audio categories.In this paper,sound event detection models are built using LibROSA,Keras and other libraries,and a sound event detection system is built using the Flask micro framework.Firstly,the models are trained on Audio Set and run on Dcase2017 Task4 dataset to compare and analyze the performance differences of the models.The gated convolutional recurrent neural network has relatively stronger anti-noise capability and relatively higher accuracy.Then,adaptive pooling function are designed,and the adaptive pooling function are analyzed from the theoretical point of view.Various models required are built for the experiments to compare and analyze the noise resistance and other performance of the improved adaptive pooling function with various other pooling functions.Then summarize the applicable range of the adaptive pooling function.Following that,the sound event detection system is implemented and tested using the Flask framework.The implementation and test process are described.Finally,the work of this thesis is summarized and future directions for improvement are proposed. |