Font Size: a A A

Research On HMM-based Lyrics To Song Conversion

Posted on:2016-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:H FengFull Text:PDF
GTID:2308330470480049Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Since speech synthesis technology is an important research content in the field of human-computer interaction, it has been a wide range of applications. Singing voice synthesis is one of the important research focuses in the speech synthesis.In this thesis, TTS(Text-To-Speech) technology is adopted in order to realize singing voice conversion by using HMM-based Speech Synthesis System(HTS). A set of speaker dependent acoustic models is trained by using training corpus. The music information is extracted from the MIDI file. A singing melody control model is established by analysing and comparing the differences between speaking voice and singing signal in terms of acoustic characteristics.The context dependent information is obtained from the input lyrics by using a text analysis to obtain acoustic parameters associated with the speaker from trained speaker dependent acoustic model. The duration and pitch are then modified by the singing melody control model. Finally, STRAIGHT(Speech Transformation and Representation based on Adaptive Interpolation of weiGHTed spectrogram) algorithm is used to synthesize the singing voice from acoustic parameters. The main works and originalities are as follows:Firstly, a set of speaker-dependent acoustic model for singing voice conversion is established by using HMM-based speech synthesis technology.A multi-speaker speech corpus is used to anlyze the acoustic parameters such as the fundamental frequency(F0), duration, the spectrum(SP) and the aperiodic index(AP).An average acoustic model is trained by using speaker adaptive training technology with multi-speaker’s speech corpus.Then the speaker adaptation transformation technology is applied to the average acoustic model to obtain a speaker-dependent acoustic model by employing the target speaker’s training speech corpus.Secondly, a singing melody control model is established.The music information is extracted from the MIDI file by analyzing the file structure of MIDI to obtain access channel label, note pitch, speed keys, the start time of the note and the duration of the note. A singing melody control model is then established by comparing the differences of the acoustic characteristics between voice and song.The singing melody control model includes a fundamental frequency control model and a duration control model.The fundamental frequency control model is used to convert the discrete note pitch to the continuous fundamental frequency contour. The duration control model is used to obtain the duration of the note.Finally, the thesis realize a lyrics to song conversion. The input lyrics text is analyzed by a text analysis to obtain context-dependent labels. Then the spectrum and aperiodic index is generated from the context-dependent labels by using speaker-dependent model. At the same time, the pitch and duration of each note in the lyrics are extracted from the MIDI file. The duration and fundamental frequency of each note are obtained by the singing melody control model. Each syllable’s acoustic parameters of the spectrum, aperidic index and fundational frequency are then modified by the duration of each note. These acoustic parameters are used as input of STRAIGHT algorithm to synthesize the song.The music is also added to the synthesized song.Subjective and objective evaluation results show that the converted songs are sounding better.
Keywords/Search Tags:singing voice synthesis, melody control model, HMM-based speech synthesis, MIDI, STRAIGHT algorithm
PDF Full Text Request
Related items