Vowel Onset Point Detection Using Source Energy And Vocal Spectrum

Posted on:2017-02-22

Degree:Master

Type:Thesis

Country:China

Candidate:S S Jiang

Full Text:PDF

GTID:2308330482995944

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Speech signal carries diverse set of information. It is difficult to extract one of all types of categories precisely, e.g. phonetic content, speaker characteristics and emotion information, etc. One of the key aspects to analyze and process it is to represent or extract various types of information. Vowel is a significant kind of phonemes which isnâ€™t only the critical energy carrier, but also implies the valid feature pattern of a wide variety of information. The vowel onset point is the instant that the vowel takes place. In the CV(Consonant-vowel) syllable structure, the vowel onset point is also the segmenting point in consonant-vowel transitions which corresponds to the instant vowel beginning and consonant ending. The different areas involve a variety of events and the vowel onset point is a significant event which is regarded as an effective feature for speech recognition, speaker recognition and emotion recognition, etc. It is valuable to detect the vowel onset point accurately.The existing methods for vowel onset point(VOP) detection can be roughly categorized as the methods based on the source information or the vocal tract information. The acknowledged methods are the VOP methods using source energy,spectral peaks energy, modulation spectrum energy as well as their combination energy proposed by Prasanna S R Mâ€™s group. Under the circumstances with the?40 ms resolution, their detection ratios are rather good, but with the resolution up to?30 ms, they decline dramatically. Aiming at this issue, this thesis proposes VOP methods using the temporal envelope, the sparse linear prediction and the maximum phase linear prediction. We make the detection performance experiments with the whole TEST part of the TIMIT corpus, a total of 336 sentences(168 speakers, two utterances per speaker). The experimental results show that for the ?30 ms resolution,the detection ratios of the sparse linear prediction and the maximum phase linear prediction methods are 66.64% and 64.19%, 3.44% and 0.99% respectively increase over the source energy method. With the resolution of ?20 ms, the detection ratios using the sparse linear prediction and the maximum phase linear prediction methods are 55.14% and 52.81%, 3.94% and 1.61% respectively increase over the source energyâ€™s.The primary innovations are as follows,1) Propose the vowel onset point detection algorithm based on modeling thetemporal envelope by the frequency domain linear prediction and provide the methods determining their parameters.2) Propose the vowel onset point detection algorithm by sparse linear prediction modeling the source energy so as to characterize it by the sparse residual.3) Propose the vowel onset point detection algorithm based on the maximum phase linear prediction modeling the source energy, which filters out the maximum phase information to yield the residual.

Keywords/Search Tags:

Vowel onset point, Linear prediction, Frequency domain linear prediction, Sparse linear prediction, Maximum phase linear prediction

PDF Full Text Request

Related items

1	Research And Application On Frequency Estimation Based On Linear Prediction
2	Research And Implementation Of Speech Dereverberation Algorithms
3	Linear Prediction In Speech Signal Processing
4	Study Of Speech Codeing System Based On Conjugate Structure Algebraic Code Excited Linear Prediction
5	Research On Optimization Of Cross-component Linear Model Prediction In VVC
6	Research And Implementation Of Low-bit-rate Speech Coding Algorithm Based On Deep Learning And Linear Prediction
7	One Low-rate Speech Coding Algorighm Based On Linear Prediction And Its Simulation
8	Research On Speech Source Localization Technology Based On Linear Prediction
9	Research Of Speaker Identification Based On Linear Prediction Residual
10	Studies On Speech Coding At Low Rates