Font Size: a A A

Study On Objective Speech Quality Assessment For Speech Communication

Posted on:2008-09-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:H W ChenFull Text:PDF
GTID:1118360215959149Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
Due to wide range of voice services provided by modern communication networks, speech communication has become one of the most prominent attributes of modern life. The rapid development of technologies and services led to an increased need for evaluating and optimizing the transmission characteristics of communication system. One of the most important index to evaluate the performance of speech communication system and devices is speech quality. Subjective evaluation is the most reliable method of speech quality assessment, which measure speech quality with listener panel, but it is very expensive and time consuming, which is unsuitable for field applications. Objective measurement methods, which replace the listener panel with a computer, have become the focus of recent quality measurement research. Supported by the military research projects, deep researches on how to assessing the speech quality exactly have been performed from several aspects in this dissertation.First of all, the concepts of speech quality assessment are introduced. Then, a survey about the principle of objective speech quality measure and the state of art of the research are presented. Finally some problems on objective speech quality measure are pointed out and the major contributions of this dissertation are presented briefly.The new feature of speech signal called Mel Frequency Spectral Coefficient (MFSC) is proposed in the chapter 2, which carries more perceptual attributes than Mel Frequency Cepstral Coefflcient(MFCC). Furthermore, Mel Spectral Distortion Measure (Mel-SD) with MFSC features is presented to assess speech quality. The experimental results of predicting speech quality for jammed wireless communication systems show that Mel-SD has better accuracy and robustness on evaluating speech quality compared with PESQ and Mel-CD. The latter experimental results show that Mel-SD also has good robustness against the change of the Mel filter bank and compressing factor.Evaluation of the relative importance of individual dimension in speech feature parameter vector is a hard task in speech signal processing area. To solve the problem and improve the performance of existing objective measure, a method using Particle Swarm Optimization (PSO) is proposed in the chapter 3, which transforms the relative importance of the dimension into optimization problems solved by the real number version and discrete binary version of PSO. Experimental results disclose that with this method, the dimensions weights of the feature coefficients are successfully optimized and the best subset are found from the whole feature space. This method gets the relative importance of individual parameter dimension as well as improves the performance of Mel-CD and Mel-SD.In the common methods, there are usually two steps to give MOS estimation, 1) calculating average distortion; 2) mapping from the average distortion value to MOS estimation by means of non-linear regressive analysis. Combining the two steps into one block using neural networks can adequately embody the perception properties of the human auditory system. To overcome shortcomings of conventional feedforward neural networks, a new training algorithm called Bi-Phases Weights' Adjusting (BPWA) is proposed. BPWA can adjust the weights during forward and backward phase, and it always computes the minimum norm square solution as the weight vector between the hidden layer and output layer in the forward pass. The new training algorithm achieves faster converging speed with good generalization performance. To improve Generalized Congruent Neural Network (GCNN), a new generalized congruence function is defined and adjustable parameter is added to the generalized congruence neurons, and then a simplified structure of GCNN model is presented. In the chapter 4 training single hidden-layer neural network and improved GCNN by BPWA are used to model the perception for assessing the MOS values. The experimental results indicated that these methods are effective in objective speech quality measure compared with conventional neural networks and training algorithms.Temporal information plays a key role in the ability of human auditory system to separate and understand sounds. Although temporal information is important, few researchers have considered and utilized this information. Considering nonlinear characteristics of the auditory system and representation of temporal information in the speech, a novel output-based objective speech quality measure is proposed here. In first step, speech signal is transformed into the cochleagrams by Lyons passive long wave cochlear model, secondly the next level of abstraction is to summarize the periodicities of the cochlear output with the correlograms, and in the final step four statistical indices are extracted from the correlograms. The experimental results show that the three measures work well.Assessing speech intelligibility by machines is a new research task. Objective Intelligibility Measure based on RBF Neural Network (OIM-RBFNN) and Objective Intelligibility Measure based on Transition Probability Distance (OIM-TPD) are presented to predict speech intelligibility for the finite speech set. Two kinds of speech feature parameters are used in these two measures. The experimental results disclose that. OIM-RBFNN and OIM-TPD with mapping by neural networks have good performance when using the feature MFCCs.
Keywords/Search Tags:objective speech quality measure, neural network, Mel frequency spectral coefficient, intelligibility
PDF Full Text Request
Related items