Font Size: a A A

Research On Modeling Speech Pitch Contour Based On Functional Data Analysis

Posted on:2014-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:H L WangFull Text:PDF
GTID:2268330422459885Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Speech is the most convenient way of communication. The requirements of qualities ofsynthetic speech are increased highly with the development of the technology ofhuman-computer speech communication. Pitch contour plays a decisive role in thesynthesized voice. Modeling pitch contour has been an important research topic in the field ofhuman-computer speech communication. In this thesis, a novel method for modeling pitchcontour with FDA method is presented. We adopt Functional Data Analysis (FDA) method tomodel the pitch contours of four kinds of monotone both for Mandarin and for Lanzhoudialect. The Lanzhou dialect can be converted into Mandarin by using the pitch model ofLanzhou monotone. The results of the research have important value for speech theory andapplications in finding the relation between Mandarin and Lanzhou dialect as well as theprosodic modeling of Lanzhou dialect and Mandarin. The main works is as follows:Firstly, a novel method for modeling pitch contour with FDA method is presented. Thetiming aligned pitch contour of monosyllable is smoothed with B-Spline basis functions. Theprincipal component and the main differential analysisare then used to obtain fitted pitchcontour. The experimental results show that the proposed method for establishing of the pitchmodel is feasible by comparing the pitch contours before alignment and after alignment.Therefore the proposed method can be applied to speech synthesis.Secondly, the pitch contours of Lanzhou dialect and Mandarin are modeled with FDAmethod. The pitch contours of160monosyllables of Mandarin and Lanzhou dialect arelabeled manually. Then the models of pitch contours of four different tones are obtained withFDA method for Mandarin and Lanzhou dialect. The errors of the models are analyzed bycalculating the root mean square error (RMSE) of model generated pitch contours. Themodels of Mandarin achieve6.47Hz of RMSE and the models of Lanzhou dialect achieve3.88Hz of RMSE. To evaluate the performance of the models, the Mandarin speech and theLanzhou dialect speech are re-synthesized with STRAIGHT algorithm by using modelgenerated pitch contour. The subjective evaluations show that the re-synthesized Mandarinspeech achieves4.17of MOS score, and the re-synthesized Lanzhou dialect speech achieves4.19of MOS score. The experiments validate the broad applicability of the FDA pitch model.Thirdly, the thesis realizes the speech conversion of Mandarinto Lanzhou dialect. Thetone information of syllables is obtained from the input text. The Mandarin WAVE files arethen analyzed with the STRAIGHT algorithm to obtain the spectral parameters and the pitch contours of each Mandarin syllable. The FDA models of Lanzhou dialect are used to generatecorresponding Lanzhou dialect’s F0contour according to the tone value. The waveforms ofLanzhou dialect are re-synthesizedwith STRAIGHT method by using generated Lanzhoudialect’s F0contour and Mandarin’s spectral parameters. DMOS Experimental resultdemonstrated than converted speech can achieve3.88of DMOS score. Therefore, the FDAbased pitch model can be applied to dialect conversion.
Keywords/Search Tags:FDA, Pitch contour, F0modeling, B-Spline, Lanzhou dialect
PDF Full Text Request
Related items