Font Size: a A A

Research On Speech Synthesis Based On Pitch Frequency Control

Posted on:2022-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y B WangFull Text:PDF
GTID:2518306329477184Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Speech synthesis is a technology that uses computers to process text information and convert text into speech.With the advent of the intelligent era,speech synthesis has become an important research content of signal processing and artificial intelligence.Speech synthesis is an important way to realize human-computer interaction.At present,most of the speech synthesis are based on complex neural network models,which have the disadvantages of difficulty in data collection and inability to adjust the pitch,resulting in insufficient flexibility in the process of speech synthesis and insufficient emotional expression.At the same time,it hides the mathematical nature of the process of people's pronunciation.Therefore,how to make the voice with high fidelity and achieve flexible switching of tones is still a problem that needs to be studied deeply.In order to solve this problem,the main research works of this thesis are summarized as follows:(1)Traditional speech synthesis technology is based on the splicing of speech waveforms,which cannot adjust the tone of the synthesized speech,and there is a phenomenon of discontinuity of speech waveform at the splicing of waveforms,which makes the hearing easy to feel not smooth.To solve this problem,a method based on pitch frequency control is proposed to switch the tone of speech.According to the spectrogram,it is found that the important parameters of speech are pitch frequency and formant.The pitch frequency curve of speech is extracted by autocorrelation function,and the formant parameters are extracted by cepstrum.These parameters are analyzed.The pitch frequency curves of the four tones of Chinese are respectively fitted by the high-order polynomial fitting method.On this basis,the interpolation method is combined with the piecewise function.This method is used to fit the pitch frequency curve of the speech with scale changes.By adjusting the coefficients of the fitting function,the switch between different tones of Chinese and the changes of different scales in voice singing are realized.The continuous pitch frequency curve is obtained through the constructor,which effectively solves the problem of inflexible tone change and unsmooth speech in speech synthesis.(2)The speech synthesis technology in the field of deep learning has effectively improved the accuracy of speech synthesis,but this technology has higher requirements on the content of the data set.The synthesized speech is greatly affected by the type of data.To solve this problem,a method based on the trigonometric function superposition method to achieve speech synthesis is studied from a mathematical point of view.This method combines the existing Chinese phoneme corpus,and realizes the synthesis of different phonemes by changing the relevant parameters of the pitch frequency curve.The mathematical function library of different Chinese single vowels and tones has been established,which greatly reduces the difficulty of collecting voice data.A speech synthesis platform that can analyze the parameters of speech and realize the transformation of tones is built,which shows the mathematical principles in the process of speech formation.In this thesis,starting from the existing problems of speech synthesis,the control of the pitch frequency curve is realized by constructing mathematical functions,and further realizes speech synthesis on this basis.Finally,a speech synthesis system was established.The results show that the average recognition rate of Chinese phonemes is 85.3%.The average recognition rate of the four tones of Chinese is 95.5%.For the voice with scale change,66.7%of them think that the degree of similarity is better.The test results verify the validity and feasibility of the method proposed in this thesis.
Keywords/Search Tags:Speech synthesis, Formant, Pitch Frequency, Tone, Polynomial fitting
PDF Full Text Request
Related items