Research On Acoustic Scene Clustering Based On Joint Learning Framework

Posted on:2021-02-27

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Zhang

Full Text:PDF

GTID:2428330611966423

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Acoustic signals carry rich and varied environmental information,and because of their non-contact and low acquisition cost,acoustic scene analysis has good application prospects in many aspects,such as smart homes,human-computer interaction,etc.In this thesis,we use complex audio as the analysis object,and discuss an acoustic scene clustering(ASC)method based on joint framework.The main work and contributions of this thesis are as follows:(1)This thesis proposes an ASC method based on deep representation(DR).Log Mel Spectrum(LMS)is extracted from the audio samples first.Then,the LMS is fed to a Convolutional Autoencoder Network(CAN)for extracting the deep representation.Next,the number of acoustic scene classes of the audio samples is estimated using a graph-based method.Finally,the Agglomerative Hierarchical Clustering(AHC)algorithm is used to merge the DRs of audio samples which belong to the same class of acoustic scenes.The experimental results show that: when evaluated on the databases of DCSAE-2017 and LITIS-Rouen,the normalized mutual information(NMI)obtained by the proposed method is 61.66% and 58.57% respectively,while the clustering accuracy(CA)obtained by the proposed method is 52.83% and 50.25% respectively.The scores of both NMI and CA obtained by the proposed method are all higher than the corresponding counterparts achieved by other methods.(2)In the method of(1),the extraction of DR feature and the clustering iteration are carried out separately instead of being learned jointly.As a result,the learned DR features may be not friendly to clustering iteration,and the clustering performance still needs to be improved.To overcome the above shortcoming,we propose an ASC method based on a joint learning framework which is composed of a CAN and a discriminative clustering network(DCN).First,we build a CAN and extract the DRs for clustering assignment initialization via common clustering algorithms.Then,we build a DCN which consists of a fully connected layer with a softmax layer.We design a loss function to guide the iterative optimization of the joint learning framework which is composed by the CAN and the DCN,and to minimize reconstruction errors and clustering estimation errors simultaneously.The proposed loss function consists of the reconstruction loss(for optimizing CAN parameters)and the clustering loss(for optimizing DCN parameters).The experimental results show that: when evaluated on the databases of DCSAE-2017 and LITIS-Rouen,the proposed method obtains NMI scores of 67.12% and 60.30%,and CA scores of 56.54% and 55.68%,respectively.The proposed method outperforms other methods in terms of both NMI and CA.

Keywords/Search Tags:

Acoustic scene clustering, Deep representation, Joint learning framework, Acoustic scene analysis

PDF Full Text Request

Related items

1	Content Analysis For Natural Acoustic Scene Based On Deep Neural Network
2	Research On Acoustic Scene Detection Based On Deep Learning
3	Research On The Key Technology For Domestic Acoustic Scene Recognition
4	A Study On Acoustic Scene Classification By Ensembling Multiple Deep Models
5	Research On Acoustic Scene Classification Using Deep Learning
6	Research On Acoustic Scene Classification
7	Research On Temporal Relation-based Audio Semantic Representation Learning
8	Acoustic Scene Classification Based On Adversarial Domain Adaptation
9	Acoustic Scene Recognition Based On Sparse Representation And Deep Neural Networks
10	Acoustic Scene Classification Using Multi-Scale Deep Feature Aggregation