Font Size: a A A

Research On Mobile Phone Clustering Based On Deep Feature Of Speech

Posted on:2019-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2428330566986066Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the popularity of portable recording devices(especially smart phones),the recorded audio data has been exploded.How to effectively identify the recording equipment is one of the hot topics in the field of digital audio forensics.In this thesis,we investigate some methods for mobile phone clustering based on the deep feature of speech from speech recordings.The main works and innovations are as follows:(1)We propose a method of mobile phone clustering based on Deep Gaussian Supervector(DGS).First,the feature of Mel Frequency Cepstral Coefficient(MFCC)is extracted from each speech recording,and is fed into the Deep Neural Network(DNN)to extract the Bottleneck Feature(BF).Then,a Universal Background Model(UBM)is built with the BF of all speech recordings,and one Gaussian Mixture Model(GMM)is adaptively generated for each speech recording with the algorithm of Maximum A Posterior(MAP).Next,the mean vectors of each GMM are concatenated for generating a Gaussian supervector which is used as the deep feature of the speech recording,namely the feature of DGS.Finally,the spectral clustering algorithm is used to merge the speech recordings recorded by the same mobile phone into a single cluster.The corpus of MOBIPHONE is used as the experimental data for performance evaluation.The K value(the geometric average of the average class purity and average phone purity),Normalized Mutual Information(NMI)and Clustering Accuracy(CA)are used as three metrics.The structure settings of DNN is experimentally discussed,and the performance of different features is compared.The experimental results show that the Kvalue,NMI and CA obtained by the DGS are 93.81%,95.11%,and 96.75%,respectively,which are higher than those of other features.Hence,the proposed feature of DGS is effective.(2)In(1),it is assumed that the labels of speech recordings are known in advance for training DNNs with supervision.In practice,the aforementioned prior information are sometimes unavailable for mobile phone clustering.To overcome the above shortcoming,we propose a mobile phone clustering method based on Deep Representation(DR).In this method,a Deep Autoencoder Network(DAN)rather than a DNN is adopted to extract the BF.The DAN can be built without any priori information about the mobile phone.Then,three corpora of speech recordings are used as experimental data for setting the hidden layer parameters of DAN and comparing the clustering performance of different features and different algorithms.Experimental results show that the DR is slightly worse than the DGS extracted in(1)in terms of clustering performance,but it is superior to other features.Compared to the DGS,the DR has the advantage that it can be extracted without requiring any prior information about the mobile phone.In addition,the proposed method is superior to the unsupervised method based on the agglomerative hierarchical clustering,but slightly inferior to the supervised method based on Support Vector Machine(SVM).Finally,we discuss the performance of the proposed method when the speech recordings are asymmetric,acquired by mobile phones of the same brands and models,or uttered by the same speaker.The experimental results show that the proposed method still has better performance under the above conditions.In summary,we use speech recordings acquired by mobile phone as the analysis object,and extract the deep features which characterize the intrinsic properties of mobile phones based on deep learning techniques.Then,we propose the methods for mobile phone clustering based on deep feature of speech,and experimentally analyze the performance of the methods from multiple aspects.Finally,we compare the proposed method with other methods for verifying the effectiveness of the proposed method.
Keywords/Search Tags:Mobile phone clustering, Deep Gaussian supervector, Deep representation, Spectral clustering, Digital speech forensics
PDF Full Text Request
Related items