Font Size: a A A

Key Technologies Research Of Speech Dynamic Feature Analysis And Speech Visualization

Posted on:2011-01-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:L F XueFull Text:PDF
GTID:1118360302477758Subject:Detection Technology and Automation
Abstract/Summary:PDF Full Text Request
Information transfer by voice is the most convenient and natural communication mean between people. Some deaf-mute cannot talk because their aural organ is damaged and cannot collect speech information to brain, but their pronunciation organ is intact. In this condition, the deaf-mute can communicate with the normal person if they accept some special train through some vision train system after a moment.The visual assistant speaking training system in order to help deaf-mute study speech has been widely researched by the inside and outside the country since the middle of 1960s. But the majority of system adopts single voice character to show image. These methods are not only very low identification rate but also make the deaf-mute difficultly accepted because of too professional.Based on the principle of speech production and perception, especially the information transform method in the human brain of speech production and perception, making use of the advantage of present technology in speech signal processing, including Wavelet Transform, Auditory Model, Artificial Neural Network and Manifold Learning, the paper brought forward a parameter description in human brain perception system of speech and a novel speech recognition method which was displayed with image mode. Compared with traditional method, the principle of novel method was easily understood and computed simply. At the same time, the paper attempted approve that the perception course of speech (at least vowel) was a simple topology map. The ultimate figure was easily distinguished and the deaf could recognize speech only by some simple training, making use of their vision compensating function. The innovation of the paper was described as follow:(a) The paper discussed the research status of the traditional speech recognition technology and speech training technology assistant hearing impaired in detail, and demonstrated the feasibility and applicability of speech to image through system research of speech production and perception. The various speech spectrum patterns in nowadays were investigated deeply, and the principle of these methods, their advantage and disadvantage were given. At last, based on traditional speech feature extracting method, including LPCC, MFCC, and PLP etc., the paper put forward the automatic speech feature extracting concept and method according to the principle of Artificial Neural Network and Manifold Learning.(b) This paper described a new speech visualization method that created readable patterns by integrating combined feature into a single image. The system made use of time-frequency analysis based on wavelet transform to simulate the band-pass filter property of basilar membrane. The auditory feature was displayed on the CRT by plot patterns and the deaf could utilize their own brain to identify different speech for training their oral ability effectively.(c) This paper described a novel speech visualization method that creates a readable pattern based on temporal self-organizing map (TSOM). According to SOM, TSOM introduced a time enhanced mechanism to improve system performance. The method remedied the defect that SOM only provided spatial topographic map ignoring temporal factor which was extremely important for speech signal. The representations of consecutive short-time spectra formed a trajectory on the map and changes in time could be observed from the representations.(d) This paper described a novel speech visualization method that created a readable pattern based on tempral linear embedding (TLE). LLE was an Unsupervised learning algorithm for feature extraction. If the speech variability was described by a small number of continuous features, then we could imagine the data as lying on a low dimensional manifold in the high dimensional space of speech waveforms. The goal of feature extraction was to reduce the dimensionality of the speech signal while preserving the informative signatures. In this paper we have present results from the analysis and visualization of speech data using PCA and LLE. And we observed that the nonlinear embeddings of LLE separated certain phonemes better than the linear projections of PCA.(e) This paper described a novel speech visualization method that created a readable pattern based on Auditory Model. The metod made use of Gammotone auditory filter and Meddis inner hair cell model to obtain auditory correlogram which expressed auditory nerve active characteristic. Then the every frequency amplitude of auditory correlogram was coded as a feature vector expressing present frequency band characteristic. The auditory model extracted the critical information of the speech signal and presented more frequency information compared with conventional acoustic processing techniques (spectrogram etc.).
Keywords/Search Tags:Speech Visualization, Combined Feature, Spectrogram, Wavelet Transform, Auditory Model, Temporal Self-Organized Map(TSOM), Temporal linear embedding (TLE)
PDF Full Text Request
Related items