Font Size: a A A

Research On Acoustic Scene Clustering Based On Joint Learning Framework

Posted on:2021-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ZhangFull Text:PDF
GTID:2428330611966423Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Acoustic signals carry rich and varied environmental information,and because of their non-contact and low acquisition cost,acoustic scene analysis has good application prospects in many aspects,such as smart homes,human-computer interaction,etc.In this thesis,we use complex audio as the analysis object,and discuss an acoustic scene clustering(ASC)method based on joint framework.The main work and contributions of this thesis are as follows:(1)This thesis proposes an ASC method based on deep representation(DR).Log Mel Spectrum(LMS)is extracted from the audio samples first.Then,the LMS is fed to a Convolutional Autoencoder Network(CAN)for extracting the deep representation.Next,the number of acoustic scene classes of the audio samples is estimated using a graph-based method.Finally,the Agglomerative Hierarchical Clustering(AHC)algorithm is used to merge the DRs of audio samples which belong to the same class of acoustic scenes.The experimental results show that: when evaluated on the databases of DCSAE-2017 and LITIS-Rouen,the normalized mutual information(NMI)obtained by the proposed method is 61.66% and 58.57% respectively,while the clustering accuracy(CA)obtained by the proposed method is 52.83% and 50.25% respectively.The scores of both NMI and CA obtained by the proposed method are all higher than the corresponding counterparts achieved by other methods.(2)In the method of(1),the extraction of DR feature and the clustering iteration are carried out separately instead of being learned jointly.As a result,the learned DR features may be not friendly to clustering iteration,and the clustering performance still needs to be improved.To overcome the above shortcoming,we propose an ASC method based on a joint learning framework which is composed of a CAN and a discriminative clustering network(DCN).First,we build a CAN and extract the DRs for clustering assignment initialization via common clustering algorithms.Then,we build a DCN which consists of a fully connected layer with a softmax layer.We design a loss function to guide the iterative optimization of the joint learning framework which is composed by the CAN and the DCN,and to minimize reconstruction errors and clustering estimation errors simultaneously.The proposed loss function consists of the reconstruction loss(for optimizing CAN parameters)and the clustering loss(for optimizing DCN parameters).The experimental results show that: when evaluated on the databases of DCSAE-2017 and LITIS-Rouen,the proposed method obtains NMI scores of 67.12% and 60.30%,and CA scores of 56.54% and 55.68%,respectively.The proposed method outperforms other methods in terms of both NMI and CA.
Keywords/Search Tags:Acoustic scene clustering, Deep representation, Joint learning framework, Acoustic scene analysis
PDF Full Text Request
Related items