Font Size: a A A

Research On Reference Speech Construction For Speech Quality Objective Evaluation Under Complex Environments

Posted on:2017-04-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:W L ZhouFull Text:PDF
GTID:1318330536952931Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Objective evaluation of speech quality is an important part of the quality of communications service(QoS),and is drawing more and more attention because the subjective speech quality tests are expensive,time consuming and inflexible.The complex environments noises in speech communication affects the people's auditory perception,which is an important factor for people to determine the speech quality.Therefore,the research on the effective speech quality objective evaluation algorithm under complex environments has been the focus.At present,the International Telecommunication Union(ITU)PESQ is the state of the art intrusive objective evaluation method in estimating the quality of speech in complex environments environments.PESQ and other intrusive evaluation method acquire the MOS via the measurement of the distortions between the clean speech and the noisy speech.However,the significant shortcoming of the intrusive evaluation is that these methods need the reference clean speech.In many real applications the reference clean speech signal is unavailable,thus the intrusive evaluation can not be widely used.On the other hand,the state of the art non-intrusive objective evaluation method P.563 and the other non-intrusive measurement schemes assess the speech quality via exploring the features which describe the subjective perception differences between the clean speech and the noisy speech.Although the refenence speech is not needed,assumptions have to be made about the reference signal,therefore,the non-intrusive assessments are expected to be underperform the intrusive evaluation methods.Based on this principle,a new non-intrusive speech quality assessment model under complex environment is proposed,which is based on quasi-clean speech construction and intrusive perceptual assessment.The model is presented with the aim of predicting a higher correlation MOS with the subjective results.This study focuses on the reference speech which is also called quasi clean speech construction.The main work of this thesis is the several proposed quasi clean speech construction method based on noise tracking and estimation,speech signal sparse reconstruction and source speration of speech and noise signal.The main contribution of this thesis are as follows:(1)A new non-intrusive speech quality assessment model under complex environment is proposed,which is based on quasi-clean speech construction and intrusive perceptualassessm-ent.The model has two steps:First,quasi-clean speech construction,which aims at reference clean speech recovering.Second,The quasi-clean speech is regarded as the reference to a perceptual model,which is a modified model of PESQ.The perceptual model acquires the Mean Opinion Score(MOS)via the measurement of the distortions between the noisy degraded speech and the quasi clean speech.(2)A improve Minima Control Recursive Averaging(MCRA)method is proposed to acquire to quasi clean speech.To slove the noise tracking delay problem in the traditional MCRA,VAD is employed to distinguish the speech and non-speech noise spectrum estimation of the proposed method,and the contionus update of the local minium is used.In addition,the non-speech prior and the frequency related thresold is proposed to access speech probability in oder to improve the accuracy of the noise estimation.The experimental results based on TIMIT and NOISEX-92 show that the proposed method has a 0.08~0.18 reduction of LLR,a 1.44dB~2.46 dB improvement of SegmentSNR,in contrast with traditional MCRA and MS methods.In speech quality evaluation,the proposed approach obtains a correlation coefficient of 0.739(condition unaveraged)and 0.857(condition averaged)on NOIZEUS and ITU-T P.Supplement-23 complex environment database,which is 87.8% and 95.1%similar to performance of the intrusive standard ITU-T PESQ,and 5.4%~9.8% outperforms non-intrusive standard ITU-T P.563.(3)The quasi clean speech constructed in(2)has the disadvantage of the cross-correlation error and the magnitude estimation error.A new sparse representation based speech reconstruction algorithm(ASRDN)was presented to acquire the quasi-clean speech from the noisy degraded signal.Firstly,an over complete dictionary of the clean speech power spectrum was learned by the K-singular value decomposition(K-SVD)algorithm.Then in the sparse representation stage,the stopping residue error was adaptively achieved according to the estimated cross-correlation and the noise spectrum which was adjusted by a posteriori SNR weighted factor,and the orthogonal matching pursuit(OMP)approach was applied to reconstruct the clean speech spectrum from the noisy speech.The experimental results based on show that the proposed method has a 0.03~0.16 reduction of LLR,a 1.26dB~3.19 dB improvement of SegmentSNR,in contrast with improved MCRA and the similar comparative method.In speech quality evaluation,the proposed approach obtains a correlation coefficientof 0.768(condition unaveraged)and 0.874(condition averaged),which is 91.3% and96.8%similar to performance of the PESQ,and 3.9%~14.8% outperforms P.563,the improved MCRA and the similar comparative method.(4)The quasi clean speech constrcted in(3)has the disadvantage of the high computational complexity and source confusion.A new Beysian NMF quasi clean speech construction is propsed.First,the limition of the standard NMF source speration of speech and noise signal is analyzed,and the variational Beysian NMF is employed to the quasi clean speech constructon.Meanwhile,considering the the noise type in real environment is unpredictable,the off-line noise base matrix training does not necessarily correspond to reality noise type,an on-line noise base matrix training method based on noisy speech data is proposed.A Universal Bayesian NMF of Noise signal is trained off-line beforehand,in source speration,based on the on-line noisy speech data,the noise base matrix is trained the using variational Beysian NMF.The experimental results based on show that the proposed method has a 0.11~0.19 reduction of LLR,a 1.46dB~4.68 dB improvement of SegmentSNR,in contrast with ASRDN and the similar comparative method.In speech quality evaluation,the proposed approach obtains a correlation coefficient of 0.802(condition unaveraged)and0.892(condition averaged),which is 95.3% and 98.9%similar to performance of the PESQ,and 4.4%~19.1% outperforms P.563,ASRDN and the similar comparative method.
Keywords/Search Tags:speech quality evaluation, non-intrusive, complex environments, reference speech constructon, noise tracking, sparse representation, non-negativematrix factorization
PDF Full Text Request
Related items