Research On Synthesis Algorithm For Chinese Singing Voice Based On Parametric Modification

Posted on:2012-08-17

Degree:Master

Type:Thesis

Country:China

Candidate:J L Li

Full Text:PDF

GTID:2178330341950381

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

The speech synthesis technology is an important research content in the field of Human-computer interaction. Singing voice synthesis is a branch of the speech synthesis, and it has also become a hot topic in recent years. In order to generate a Mandarin song, the Text To Speech (TTS) technology and voice modification technology are combined together in this paper. Firstly, the differences between speech and singing voice signal are analyzed. Then, the melody control model and spectrum model which based on Gaussian Mixture Model are established. Lastly, a singing voice synthesis system is achieved with variable timbre.The results of our research can reveal inherent differences between the speech and singing voice, and have an important theoretical significance for the speech synthesis. In addition, our system can be applied to entertainment areas and even some music creation fields. It makes the singing voice synthesis more entertaining and interesting. The main research results and innovation are as follows:Firstly, the differences between the speech and the singing voice are compared by analyzing their acoustic parameters. The speech and singing voice are all produced by the same organs, while the perceived effect of them is different. The differences are analyzed in the paper. Speech main transfer semantic content, and singing voice main transfer emotions. The singing voice has rich harmonic and its HHG still has high energy, and the trend of the fundamental frequency and other harmonic are very smooth. Compared with singing voice, the speech's trends are related with speech's tones.Secondly, a new singing melody control model is constructed. According the difference between them in fundamental frequency (F₀), the F₀ control model is mainly based on vibrato. The melody control model is adopted to achieve the lyrics to song conversion by modifying the acoustic parameters of speech signal. The MOS test shows that the average MOS score of synthetic song before timbre conversion is above 3.29.Thirdly, a singing's spectrum modification model is proposed in this paper, which is based on GMM. Constructing the STRAIGHT spectrum model by using GMM, and the singing's timbre can be converted. The ABX test demonstrates that the accuracy can be up to 100% in the case of k = 0 or 1, and it can be higher than 62.5% in the case of 0 < k< 1. The professional singer's timbre can be added proportionally. The experiments also show the mean of GMM has greater impact on a singer's timbre than weight ratio and covariance.Finally, a new singing voice synthesis system based on STRAIGHT is constructed in this paper. The system consists of text analysis module, concatenative synthesis model, melody control model, and timbre control model. Obtaining the pronunciation information of the lyrics through the textual analysis process, and getting the synthesized speech by using the concatenative model. The melody control model is used to achieve singing voice synthesis, and the timbre control module is used to convert the singing's timbre.

Keywords/Search Tags:

Text to Speech, Melody control model, GMM, MIDI, STRAIGHT algorithm

PDF Full Text Request

Related items

1	Research On HMM-based Lyrics To Song Conversion
2	Research On The MIDI Music Retrieval Algorithm Based On Humming
3	A New Musical Features Extraction Algorithm Based On MIDI Files And FPGA Design
4	Expressive Text-to-speech System On Mandarin
5	A Research On Key Technologies Of Music Melody Automatic Extraction And QBH System
6	Music Similarity Comparison And Approximate Search
7	Speech Recognition Algorithm Based On Straight Spectrum Study
8	The Research Of Prosodic Control Algorithm And Realization For Chinese Speech Synthesis
9	Research On Lanzhou-Dialect Speech Generation
10	The Study And Application Of Text-to-Speech System