Font Size: a A A

Vowel Onset Point Detection Using Source Energy And Vocal Spectrum

Posted on:2017-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:S S JiangFull Text:PDF
GTID:2308330482995944Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Speech signal carries diverse set of information. It is difficult to extract one of all types of categories precisely, e.g. phonetic content, speaker characteristics and emotion information, etc. One of the key aspects to analyze and process it is to represent or extract various types of information. Vowel is a significant kind of phonemes which isn’t only the critical energy carrier, but also implies the valid feature pattern of a wide variety of information. The vowel onset point is the instant that the vowel takes place. In the CV(Consonant-vowel) syllable structure, the vowel onset point is also the segmenting point in consonant-vowel transitions which corresponds to the instant vowel beginning and consonant ending. The different areas involve a variety of events and the vowel onset point is a significant event which is regarded as an effective feature for speech recognition, speaker recognition and emotion recognition, etc. It is valuable to detect the vowel onset point accurately.The existing methods for vowel onset point(VOP) detection can be roughly categorized as the methods based on the source information or the vocal tract information. The acknowledged methods are the VOP methods using source energy,spectral peaks energy, modulation spectrum energy as well as their combination energy proposed by Prasanna S R M’s group. Under the circumstances with the?40 ms resolution, their detection ratios are rather good, but with the resolution up to?30 ms, they decline dramatically. Aiming at this issue, this thesis proposes VOP methods using the temporal envelope, the sparse linear prediction and the maximum phase linear prediction. We make the detection performance experiments with the whole TEST part of the TIMIT corpus, a total of 336 sentences(168 speakers, two utterances per speaker). The experimental results show that for the ?30 ms resolution,the detection ratios of the sparse linear prediction and the maximum phase linear prediction methods are 66.64% and 64.19%, 3.44% and 0.99% respectively increase over the source energy method. With the resolution of ?20 ms, the detection ratios using the sparse linear prediction and the maximum phase linear prediction methods are 55.14% and 52.81%, 3.94% and 1.61% respectively increase over the source energy’s.The primary innovations are as follows,1) Propose the vowel onset point detection algorithm based on modeling thetemporal envelope by the frequency domain linear prediction and provide the methods determining their parameters.2) Propose the vowel onset point detection algorithm by sparse linear prediction modeling the source energy so as to characterize it by the sparse residual.3) Propose the vowel onset point detection algorithm based on the maximum phase linear prediction modeling the source energy, which filters out the maximum phase information to yield the residual.
Keywords/Search Tags:Vowel onset point, Linear prediction, Frequency domain linear prediction, Sparse linear prediction, Maximum phase linear prediction
PDF Full Text Request
Related items