Font Size: a A A

Research On Digital Speech Forensics Based On Source Device Information

Posted on:2017-02-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:L ZouFull Text:PDF
GTID:1318330536952906Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
As an important component of digital multimedia forensics,digital audio blind forensics has gained more attention recently.In reality,digital audio forensics usually appears in the form of digital speech forensics.In the digital multimedia forensics community,digital image forensics which utilizing the source device information has received considerable attention,however,digital audio forensics based on the source device information is still in the initial stage and deserves more attention.This thesis focuses on the digital speech blind forensics based on the source device information.We have carried out works from the following aspects and made contributions.1)A new cell phone corpus was collected which comprises 15 cell phone units.This corpus was named as SCUTPHONE.In addition,a corpus including four types of recording device and 22 device units was also constructed.Both corpora are gender-balanced.There are 240 sentences for each device unit and each sentence is approximately 3 seconds.For the emerging cell phone identification problem in digital speech forensics,we propose a source cell phone identification scheme based on Gaussian mixture model-universal background model(GMM-UBM).Mel-frequency cepstral coefficients(MFCCs)are utilized to characterize the recording device unit.We achieve better identification rate on a cell phone corpus compared to the scheme based on VQ and linear SVM.In addition,we compare the MFCCs with the newly developed Power-normalized cepstral coefficients(PNCCs)which consider noise reduction and investigate the influence of various processing of PNCCs to the performance.2)The studies on source recording device recognition has mainly focused on the source recording device identification problem,few studies have focused on the source recording device verification problem.Motivated by the powerful classification ability of Sparse Representation-based Classification(SRC),we propose a sparse representation based recording device verification framework.We first propose the exemplar dictionary and unsupervised learned dictionary(here K-SVD)based schemes.Then,we exploit that the subsequent discriminative dictionary learning considers the representational and discriminative power simultaneously and propose a new source device verification scheme based on discriminative K-SVD(D-KSVD).The whole process can be divided into two stages.Firstly,for each device,we train a discriminative dictionary to obtain the parameters(dictionary and a linear classifier)which can be deemed as the model for the device.In the second stage,given a speech recording and the claimed device,source device verification is conducted with respect to the corresponding model of the device.Scoring metrics based on the classification output of the linear classifier are proposed.Evaluation experiments show that the newly developed discriminative dictionary based scheme outperforms other two sparse representation based schemes and two baseline systems.3)In digital speech forensics based on source device information,the recording devices are sometimes unavailable and only speech recording samples are available.Motivated by this fact,we define and propose a new problem which is full of realistic significance in digital speech forensics,i.e.,source recording device matching problem.We also propose a new source recording device matching scheme based on sparse representation and KISS metric.Given two speech recordings,we first extract Gaussian supervectors(GSVs)from the two speech recordings,the discriminative power of sparse representation is utilized to further extract feature from GSVs.Then,similarity matching are conducted based on the Regularized Smoothing KISS metric(RS-KISS metric)and the matching score is compared to a preset threshold to make final decision.Evaluation experiments show the effectiveness of the proposed scheme.4)We analyze the signal processing procedures inside a typical recording device for producing a speech recording and find that the microphone,transmission circuit and A/D converter leave in the speech recording their intrinsic noise traces which we term recording device noise.A new device fingerprint utilizing recording device noise is proposed.More specifically,for each recording device under investigation,averaging the noise spectrum estimated from multiple speech recordings using a noise estimation algorithm to obtain the recording device noise.In order to fully estimate the device noise,two noise estimation algorithms are utilized and the estimated noise spectrums are combined.Evaluation experiments on two corpora demonstrate that the proposed new feature outperforms several other features in the literature.5)Recently,a method for recording source forensics is presented in the literature which first conducts blind channel estimation from speech recording and then characterizes the recording device by exploiting the estimated channel information.We first propose two improved blind channel estimation methods.Motivated by the observation that Logarithmic Mel-spectral Coefficients(LMSCs)and MFCCs demonstrate a certain complementary in channel recognition problem,we propose a blind channel estimation method based on joint spectrum clustering: two clean speech spectrums are estimated by the two types of features respectively,the final estimation of clean speech spectrum is a tradeoff of the two estimated spectrums.In addition,speech signal spectrum is more stochastic at high frequencies than at low frequencies,Multi-window Spectral Estimation(MWSE)can be utilized to estimate the spectrum at high frequencies,thereby reducing the spectral estimation variance.Therefore,we propose another novel and improved method,i.e.,two-band blind channel estimation method: the recorded speech is divided into two parts at a cutoff frequency and the low-frequency parts are processed using the original blind channel estimation method,i.e.,based on FFT and RASTA-MFCCs whereas the high-frequency parts are processed based on MWSE and MFCCs.Then,the two estimated spectrums corresponding to low-frequency and high-frequency parts are concatenated to create the final estimated channel magnitude spectrum.Experiments show that two improved methods for blind channel estimation further enhance the channel estimation accuracy.Once the channel magnitude spectrum is estimated,channel feature is constructed from the estimated channel magnitude spectrum and the original speech magnitude spectrum.Then channel feature is applied to the recording source forensics.Device identification experiments on 37 recording device units show the effectiveness of the channel feature based on the new channel estimation method.
Keywords/Search Tags:Digital speech blind forensics, Source recording device forensics, Sparse representation, Discriminative dictionary learning, Recording device noise, Blind channel estimation
PDF Full Text Request
Related items