Font Size: a A A

The Research On Restoration Of Throat Microphone Speech

Posted on:2013-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:D W FengFull Text:PDF
GTID:2248330374481484Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The throat microphone is a kind of transducer which can pick up sound by the skin vibration near the throat. The throat microphone speech is intelligible, but sounds unnatural. The Throat Microphone (TM) picks up speech that is transmitted from the pharynx region, and the’buzz tone1of the larynx. A study of the acoustic characteristics of various sound units in the TM and Normal Microphone (NM) speech shows that the TM and NM signals differ in the vocal tract characteristics as well as in the characteristics of the excitation source for different sound units. However, though there are acoustical differences, there exist some common features (for example, pitch and location of formants) in the simultaneously recorded TM and NM speech of a speaker. The main objective of this thesis is to improve the naturalness and the perceptual quality of the TM speech.Artificial Neural Network-based voice conversion method was presented to improve TM speech quality. Artificial Neural Network is used to obtain a smooth mapping of the TM spectrum onto the NM spectrum for each frame. After analyzing the acoustic and spectral characteristics of TM speech, the main differences between TM and NM speech was studied. Therefore modify the characteristic parameters of the vocal tranct function was necessary. The modified methods should choose in voice conversion methods.Through comparing the acoustic characteristics of cepstral coefficients, line spectrum frequency and Mel-frequency cepstrum, Mel-frequency cepstrum is a better representation of TM speech. Because design of the mel-frequency cepstrum was considered with perceptual factors. Through comparing the converted voice via GMM and ANN, the quality of GMM-based converted voice is better.One of the advantages of using a throat microphone is that it provides a high Signal-to-Noise Ratio (SNR) over speech frequency range in the noise environment. This thesis explores the presence of speech, speaker and language characteristics in the TM speech for developing speech systems. The entire conversion system includes a training component and a conversion part. In training stage, mapping various parts of the speech signal model, including the excitation and vocal tract. The mapping model using neural network and Gaussian mixture model (GMM) were conducted. The acoustic features include Cepstral Coefficients, the Line Spectrum Pair, Mel-frequency cepstrum, etc were used in the conversion work. Finally, the converted voice was evaluated. The conversion quality evaluation including a subjective evaluation and a objective evaluation.As the TM speech is relatively immune to noise, this study may promote in strong noise environment.
Keywords/Search Tags:Throat Microphone, Voice Conversion, Artificial Neural Network, Gaussian mixture model, Mel-frequency cepstrum
PDF Full Text Request
Related items