Font Size: a A A

Research On Robust Feature Extracting And Visualization Of Speech Signal

Posted on:2010-06-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y HanFull Text:PDF
GTID:1228330371450194Subject:Detection Technology and Automation
Abstract/Summary:PDF Full Text Request
Speech is acoustic representation for language. It is the most natural, effective and convenient method when communicating information between people, is a kind of relying for human thought. However, language communication becomes very difficult for hearing handicapped people. Some deaf-mute cannot talk because their aural organ is damaged and cannot collect speech information to brain, but their pronunciation organ is intact. In this condition, the deaf-mute can communicate with the normal person if they obtain some special training through vision training system for a period of time. Listening lossless compensation of speech visualization technology for deaf-mute is rising. This paper stands on the conception of research by means of extracting speech features, and then mapping the image with voice meanings to assist deaf-mute learning and hearing. While speech signal feature extracting relates to speech recognition and visualization systems performance, although present speech features have very good robustness under quiet environment, their performance will decrease sharply under noisy environmental conditions. So the purpose of this paper is mainly to extract robust speech feature under noisy environmental conditions and deeply study in visualization.The main contents and innovations of this dissertation include:(1) This paper proposed a novel speech endpoint detection algorithm aiming to improve the accuracy in low signal-to-noise ratio (SNR) conditions. Core technology was the complementary advantage between the Short-time energy-zero-product and discrimination information, which used Short-time energy-zero-product algorithm to make judgment firstly, and then used discrimination information based on the sub-band energy distribution probabilities algorithm to recheck when met with the transition frame for noise frame and speech frame, so as to avoid error-detected owing to the sharp change of noise amplitude. Moreover, we proposed a novel dynamically update the noise energy threshold algorithm, which has high accuracy when traces the changes for noise energy. The simulation results show that the new method gives a precise and rapid endpoint detection in the case of the seriously changed noise environment, and it plays an important role in the latter speech research.(2) Due to the fact that the learning effects of wavelet neural network strongly depend on the number of hidden nodes, the initial weights(including thresholds), the scale and displacement factors, the learning rate and momentum factor, which leads to weak global search capability, easily falling into local minimum values, low convergence rate, and even not convergent. While Genetic Algorithm (GA) has height parallel performance, random and adaptive search performance, and it has obvious advantages in solving complex and nonlinear problem. Therefore, we can combine neural network and genetic algorithm by using GA to select initial value, and use wavelet neural network to finish the learning. The simulation results show that the new model effectively improves speech recognition rate, shortens the recognition time, realizes double wins in efficient and time, establishes the foundation for practicality of the algorithm.(3) This paper proposed a novel feature extraction algorithm aiming to improve speech recognition and visualization systems robustness in noisy environmental conditions. Core technology was the Multiple Signal Classification (MUSIC), which estimated MUSIC spectrum from the speech signal and incorporated perceptual information directly into the spectrum estimation, this provided improved robustness and computational efficiency when compared with the previously proposed Mel Frequency Cepstral Coefficient (MFCC) technique.(4) Dynamic characteristics is a part of speech diversity, which is different from stationary random process, and has the temporal Correlation, reveals the intimate connections with speech signal pre and post and adjacent. Because the difference parameters and acceleration parameters can not adequately excavate speech dynamic characteristics, whereas modulation spectrum has time frequency agglomeration performance, not only adequately reflects speech dynamic characteristics, but also has lower sensitivity for speech environment. So according to different reflect in modulation spectrum for interference signal and speech signal to extract the effective components for modulation spectrum, then the cepstrum coefficients were extracted as the feature parameter. The simulation results show that the new method has very good robustness.(5) Different frequency within corresponding critical bandwidth signal for human ear cause basement membrane vibration in different location, while the constant Q characteristics of each analysis frequency for wavelet transform and signal processing characteristics for human auditory are consistent, so this paper combined with the frequency band multi-level division with wavelet packet transform, and according to the characteristics for human ear perception frequency band, adaptively selected relative frequency band, proposed a new feature extracting algorithm based on wavelet packet transform, the simulation experimental results show that the new method has very good robustness.(6) In view of how to select complementary speech parameters in plenty of feature parameters, a systematic and practical method of the parameters selection based on the variance orthogonal test design is proposed. Firstly, chose factors (speech parameters) and levels. And then according to the principle of mathematical statistics and orthogonality, picked out proper and representative points from massive experimental points to construct orthogonal table. Finally, calculated and analyzed the experimental results, and the optimal set of process parameter values was discovered. Moreover, the word error rate and response time are reduced when compared with that of the traditional parameter selection method.(7) In view of the stronger superiority of deaf-mute in visual identification ability and visual memory ability for color, two kinds of new speech visualization methods were proposed. One was the method combining LLE (Locally Linear Embedding) with fuzzy kernel clustering algorithm, where the improved LLE could reduce the nonlinear dimensionality of the speech features and then the fuzzy kernel clustering algorithm was used for clustering analysis. That is to say, the Mercer kernel function was used to change the data in original space into a high-dimensional eigenspace through nonlinear mapping, and then the fuzzy clustering analysis was made in the high-dimensional eigen-space. Therefore, after the kernel function mapping, the original inherent features of speech were highlighted to improve the discriminations of the different speech. The results of simulation experiments show the feasibility and effectiveness of the method. Another method was based on position and pattern algorithm, it created readable patterns by integrating different speech features into a single picture. first, series preprocessing of speech signals were done, then extracted features. We used three formant features to map principal color information, used intonation features to map pattern information, and then 23 features selected by orthogonal test design used as the inputs of neural network 2. Finally, the outputs of neural network 2 mapped the position information. We evaluated the visualized speech in a preliminary test and contrasted with spectrogram, the test result shows that the visualization approach is an effective method to assist deaf-mute learning and has very good robustness.
Keywords/Search Tags:speech recognition, speech visualization, endpoint detection, wavelet transform, neural network, genetic algorithm, multiple signal classification (MUSIC), modulation spectrum, orthogonal test design, locally linear embedding (LLE)
PDF Full Text Request
Related items