Font Size: a A A

Anti-noise Power Normalized Cepstral Coefficients For Two-level Robust Environmental Sounds Recognition In Real Noisy Conditions

Posted on:2014-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:X YanFull Text:PDF
GTID:2308330461972546Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Environmental problems directly influence people’s daily life and environmental sounds contain a large amount of information about the living environment. There are various kinds of noises in real life scenes. This paper proposes a new two-level robust environmental sounds recognition technique based on a novel anti-noise feature to improve environmental sounds recognition accuracy in real non-stationary noisy scenes. First, we record background noises in real life scenes and collect clean environmental sounds from the Freesound audio database. Then, the research begins with bird sounds recognition under noise scenes in the real world, and proposes the environmental sounds recognition technique based on the novel APNCC extraction and the two-level recognition architecture. Finally, this technique is generalized to environmental sounds recognition.To improve environmental sounds recognition accuracy, we propose and the main methods as follows:1) The two-level environmental sounds recognition architecture. The first level: after the clean environmental sounds preprocessing, first, the segment features are extracted, namely, the volume dynamic range (VDR), non-silence ratio (NSR), non-pitch ratio (NPR) and smooth pitch ratio (SPR). Next, the 4 features form the fused segment feature. Finally, we cluster all the sound segments in the train set using the fused segment feature and the K-means classifier. The second level:the frame features APNCC, PNCC and MFCC are extracted firstly. Then, we use the 3 features to model the SVM classifier in each cluster respectively. To all the test set sound segments under different SNRs, the test stage of the two-level environmental sounds recognition includes two classification steps, the K-means classification based on the fused segment feature and the SVM classification based on the frame features.2) The APNCC extraction contains two-stage denoising. First, to deal with complex and diverse background noises of real life scenes, the highly non-stationary noise estimation algorithm is applied for the noise power spectrum estimation. Second, to achieve noise reduction with less residual colored noise, we present the multi-band spectral subtraction. Finally, the process of PNCC extraction is combined with the estimated clean environmental sounds to extract APNCC. The APNCC extraction process includes two-stage denoising, the multiband spectral subtraction based on the noise power spectrum estimation and the medium duration GT energy bias removal.In this paper, environmental sounds include bird sounds, weather sounds, mammal sounds and insect sounds.70 subclasses of 4 environmental sounds classes are firstly clustered by the fused segment feature through the K-means classifier. Then the comparison experiments in different scenes under different SNRs are constructed based on the combination of the S VM classifier and different frame features, namely the APNCC, PNCC and MFCC.The experimental results show that the two-level recognition architecture with APNCC extraction outperforms other features in average environmental sounds recognition accuracy and noise robustness, especially for real life scenes of SNRs lower than 30dB.
Keywords/Search Tags:two-level robust environmental sounds recognition, Anti-noise Power Normalized Cepstral Coefficients (APNCC), non-stationary noise estimation, Multi-band Spectral Subtraction (MBSS), Mel-Frequency Cepstral Coefficients (MFCC)
PDF Full Text Request
Related items