In recent years,with the rapid development of various fields of machine learning,modern society has higher and higher requirements for artificial intelligence algorithms.As an important technology in the field of machine learning,voice recognition plays a key role in all aspects of the industry.Based on the techniques of data enhancement and model fusion,this paper proposes an algorithm for human voice and music detection in the context of different forms of sound spectrogram analysis tasks.In this paper,in data processing,the frequency domain and time domain information in the spectrogram converted from audio is randomly set to zero,which realizes the enhancement of data,improves the generalization ability of the model to unknown data,and prevents overfitting.In terms of network structure,the combination of convolutional neural network and gate cyclic unit is used to build a basic network suitable for processing the local features of sound spectrogram.Because of the existence of cyclic neural network,the model can also capture audio data.timing information to make better judgments.Based on the understanding of machine learning and the application of various technologies,this paper optimizes the model from the perspectives of activation function and optimizer.Through the established evaluation criteria and loss function,it aims to solve common machine learning problems such as overfitting and gradient disappearance.The model was evaluated,and finally a complete system with support for spectrogram generation,display and recognition was built,and the audio event detection task of classifying human and musical sounds in different scenarios was realized. |