Font Size: a A A

Study On The Applications Of Neural Networks In Objective Assessment Of Speech Quality

Posted on:2008-10-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:T Y YanFull Text:PDF
GTID:1118360242471006Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
Subjective measures for speech quality have many shortcomings, such as time-consuming, laborious, costly, unwieldy, unrepeatable, unstable, listener dependent, etc. Therefore, to overcome the shortcomings of subjective measures, convenient objective measures have been established. Since the regression process with simple mathematical formulas in traditional objective measures is hard to show perceptual properties, neural network objective models for speech quality assessment that approach subjective auditory perception process are proposed here. For a large data base of Chinese vocabulary, neural network objective models are adopted to assess the input-to-output based and output-based speech quality. For a closed data set of limited Chinese words, intelligibility speech quality assessment is carried out by transition probability measures. Structure and algorithm of neural networks related to speech quality measures are studied too.Traditional measures for speech quality assessment (TM/SQA) highly depend on feature parameters and distortion measures, and are sensitive to the increase in channel disturbance, and are hard to show perceptual properties with simple mathematical formulas in the regression judgment models. Radial basis function neural network for speech quality assessment model (RBFNN/SQAM) that in part imitates speech perception is presented, and it is nonlinear transformation that effectively maps the feature parameter space into auditory perception character. RBFNN/SQAM has many merits: it less depends on feature parameters; and with the increment by channel disturbance, it can still get good results; and so on. The results of the input-to-output based speech quality assessment show that RBFNN/SQAM is much better than TM/SQA. The objective mean opinion scores (MOS) of RBFNN/SQAM are highly correlated to the subjective MOS. While in training process, setting up a RBFNN/SQAM needs much more time cost.Generalized congruence neural network for speech quality assessment model (GCNN/SQAM) is proposed, which needs less training time and has better performance. Compared with RBFNN/SQAM, besides owning all the merits of RBFNN/SQAM, GCNN/SQAM has many more merits: simpler structure, better generalization, higher correlation, smaller standard deviation, smaller absolute error, and saving 1/3 training time. In all, GCNN/SQAM has great advantage over RBFNN/SQAM. A novel recurrent generalized congruence neural network (RGCNN) with simplified structure and reduced algorithm is described, and RGCNN for speech quality assessment model (RGCNN/SQAM) is set up too. First, RGCNN is introduced in detail from two aspects: network structure and weight updating algorithm, and its special characteristics and advantages are summed up by comparing with other recurrent neural networks (RNN). Second, identification simulations validate its effectiveness and fast convergence. In order to express the dynamic time-changeable behavior of voice signals, we employ RGCNN with dynamics properties as objective assessment model of speech quality. At last, the input-to-output based objective speech quality predictions for continuous sentence speech and digital string voice which are evaluated by RGCNN/SQAM have high correlation with the subjective MOS.A new and robust, output-based quality assessment using neural networks (NN/OBQA) is given. The input-to-output speech quality measures have two main problems: the time synchronization between the input and output speech vectors is a crucial factor, and in many applications the original speech is not available. Therefore, the measure of NN/OBQA is presented. In the method of NN/OBQA, first, just the feature parameters of the degraded (output) speech signals are extracted at the output end of the voice transmission system; then, by the nonlinear transformation of neural network, the feature parameters are mapped into subjective MOS, and the outputs of the neural networks are the objective prediction MOS only based on the degraded speech signals. The proposed output-based measure, i.e. NN/OBQA, correlates very well with the actual subjective MOS.In all speech quality measures, majority of them are for MOS assessment, but just few of them is for intelligibility assessment. Therefore, for the closed data set of limited Chinese words, intelligibility speech quality assessment is carried out by transition probability measures (TPM/ISQA). First, the principle of TPM/ISQA is prescribed. Then, according to the principle, two methods for intelligibility assessment are presented: one is the intelligibility speech quality assessment based on Euclidean distance transition probability measure (EDTPM/ISQA), and the other is intelligibility speech quality assessment based on linear correlation transition probability measure (LCTPM/ISQA). And then, the reference matrix is designed. In general the source (clean) speech signals are designed for reference matrix, i.e. clean reference matrix (CRM); but there is a novel way that the degraded (noise) speech signals are proposed for reference matrix, i.e. noise reference matrix (NRM). The proposed two measures, EDTPM/ISQA and LCTPM/ISQA, each with CRM and NRM for reference matrix, both have successful intelligibility assessment. Compared between CRM and NRM, NRM improves the correlation coefficient between the objective intelligibility of the two proposed measures and the subjective intelligibility.
Keywords/Search Tags:Objective Assessment of Speech Quality, Generalized Congruence Neural Network, RBF, MOS, Intelligibility
PDF Full Text Request
Related items