Font Size: a A A

Statistical Model Based Mandarin Chinese Singing Voice Synthesis

Posted on:2016-07-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:X LiFull Text:PDF
GTID:1108330473961622Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Singing voice is the act of producing musical sounds of human by controlling vibra-tion of vocal cord and deformation of oral, nasal and other vocal organs. And singing voice synthesis refers to the technique of enabling computer to sing like a human by using speech synthesis related techniques. With the development of speech synthe-sis technology, speech synthesis based on statistical parameter model, especially the Hidden Markov Model(HMM)-based approach, has made very fast progress.In 2006, HMM based techniques was used for singing voice synthesis, and have achieved good results. A series of advantages of statistical model approach, such as small footprint, high flexibility, have been maintained in singing voice synthesis. More importantly, the human effort required to build the system is very small (manual annotation database is not required), and the entire process of generating singing voice from the input musical score can be automated, which greatly increased availability of singing voice synthesis.The main research objective of this paper is Mandarin Chinese singing voice syn-thesis, and we focus on statistical model based algorithm and building a complete system of generating personalized singing voice from input musical score(lyric included).The main contribution of this paper can be summarized as the following points:1. In order to provide adequate and high quality singing voice database for personal-ized singing voice synthesis, we independently designed and recorded a singing voice database, produced 210 musical score, recorded 132 minutes of singing voice recording data, designed and extracted contextual features includinging mu-sical and liguistic information.2. Built a complete singning voice synthesis system using the database based on statistical method. The system can generate singing voice with precise pitch and rhythm, as well as high quality and expressivessness.3. According to characteristics of the fundamental frequency of the singing voice, we propose a series of improvements to generate fundamental frequency contour.· To address the data sparse problem of fundamental frequency data, the fun-damental frequency contour of the musical score was introduced into the generation procedure, which can generate fudmental frequency contour with precise pitch.· Further proposed a method modelling the difference between fudamental frequency of real voice and that of musical score, thus the fundamental fre-quency of the musical score was not only used in generation, but also in training, the method can generate more real fundamental frequency con-tour. In addition, this method can also be synthesized slur.· Proposed a method combining two statistical models based on different time scales. This method combines two different time scales, state and syllable level, can overcome smoothness to produce more expressive fundamental frequency contour.· Proposed a method based on unit selection. The method decomposes funda-mental frequency contour to two parts, shape contour and vibrato contour, for every note. And the two parts are modeled by statistical models. During generation, two parts are selected from pre-processed units on the basis of statistical models. Vibrato is modeled at note-level, so generated vibrato is more accurate. And selection of real units can make generated contour more real and expressive.4. An Emotional prosody conversion method is proposed based on statistical mod-els. This method can use a small emotional speech database to generate emotional speech.5. Proposed a method with autoregressive correlation between frames in Gaussian mixture model based voice conversion framework, and applied to convert the speaker and singer conversion. This method is suitable for low latency applica-tions.
Keywords/Search Tags:Singing voice, Singing voice synthesis, Four elements of music, Singing voice databasc, Hidden markov model, Gaussian mixture model
PDF Full Text Request
Related items