With the continuous evolution of machine learning,audio classification has increasingly become a research hotspot with broad applications across various fields.Zero-shot learning provides a solution to the limitations imposed by the scarcity of target audio data in the current stage,which hinders research in the field of audio processing.Its objective is to identify audio class samples that have not been encountered during the training phase,thereby enabling audio classification approaches to reduce dependence on datasets and adapt to a wider range of application scenarios.In light of this,this paper conducts a comprehensive study on the zero-shot audio signal classification method,whose corresponding model includes audio feature extraction,auditory descriptor generation,and zero-shot learning,aiming to predict the audio category of unseen test samples by learning information from visible audio category samples.The primary contributions of this paper are as follows:(1)This paper presents a novel approach based on spectrograms and synthetic classifiers to address the issues of limited representation capability of audio features and insufficient discriminative information learning in zero-shot audio classification.This method generates spectrograms from audio signals,feeds them into a pre-trained model to obtain corresponding feature representations,and employs a synthesized classifier-based method to achieve zero-shot audio classification.(2)This paper proposes a zero-shot audio classification model based on artificial auditory descriptors,aiming to address the issues of high redundancy and modal mismatch in semantic auditory descriptor information.These descriptors,generated through manual auditory annotation,form an artificial auditory confusion information matrix that represents inter-category differences.This approach effectively circumvents the drawback of semantic auditory descriptors containing a large amount of non-audio-related information,thereby enhancing model performance.(3)This paper introduces a zero-shot audio signal classification model based on generative learning,targeting the issue of diversity in unseen audio class samples.In the zero-shot learning task,the generator network produces samples from unseen categories,allowing the classifier to train on the generated data and gain classification abilities for new categories,thereby improving audio classification performance.Finally,we provide experimental results of these models on the ESC-50 dataset,and partial models on the pre-screened subset of Audio Set,along with an analysis of these results.The results demonstrate that our proposed methods enhance the classification performance of audio signals in zero-shot learning to varying degrees. |