Font Size: a A A

The Study Of Acoustic-to-articulatory Inversion Based On Generative Adversarial Networks

Posted on:2021-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ZhangFull Text:PDF
GTID:2518306548981369Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The implementation of the "Belt and Road" policy has set off a frenzy of "Chinese fever ",however,many foreign learners who regard Chinese as a second language have encountered problems in learning Chinese,such as the single traditional Chinese learning model,the lack of teaching resources and language differences.Human perceptual communication is a multi-sensory process,audio-visual multi-modal information can effectively promote language understanding and learning.Intelligent auxiliary language learning is to provide learners with guidance on the movement of pronunciation organs,enhance learners' understanding of pronunciation,and master the rules of pronunciation more accurately,so as to effectively solve the problem of learning difficulties in Chinese to a certain extent.Considering the current demand and the lack of Chinese intelligent auxiliary language learning under the background of the times,this paper uses the Chinese ultrasonic database to propose a speech inversion(Generative Adversarial Network Acoustic-to-Articulatory Inversion,GAN-AAI)based on generating confrontation network to restore the motion state of the internal vocal organs of the human body from the speech signal,and to use intelligent assistance for language learning.In this paper,acoustic speech data and ultrasonic image data collected from Chinese ultrasonic database are first applied to the study of speech inversion.firstly,the acoustic speech data and ultrasonic image data are preprocessed by using the conventional acoustic normalization method general acoustic space,the acoustic speech data and ultrasonic image data MFCC features are extracted,and the ultrasonic image data use PCA dimensionally to extract the motion features of the vocal organs.The speakerindependent acoustic-to-articulatory inversion based on general acoustic space GAS is realized.Secondly,the LSTM-based generation network and the Res Net-based discrimination network are constructed respectively,and the two networks are combined to form the speech inverse push network(GAN-AAI)to realize the speech inverse push.finally,the least squares loss function is used to replace the cross-entropy loss function in the GAN to improve the traditional GAN.the least squares generation adversarial network(LSGAN)is constructed to realize the speech inversion.The experimental results show that all three models can reflect the motion state of the vocal organs inside the human mouth.GAS,GAN structure similarity is improved compared with the traditional speech inversion model,and the optimal results are obtained LSGAN-AAI,the structure similarity reaches 81.76%,which is 4.1% higher than the GAS-AAI.the root means square error shows that the LSGAN-AAI and GAN-AAI ratio GAS-AAI decrease by 27.35% and 13.48% in error,respectively.In this paper,we use Chinese and Chinese vowel phonological data to verify the inverse-push model,and the results show that the acoustic-to-articulatory inversion model can reflect the motion information of vocal organs accord with the characteristics of human pronunciation.For different speakers with individual differences,the model can reflect the information of the movement of the vocal organs that reflect individual differences.
Keywords/Search Tags:Acoustic-to-Articulatory, Chinese Ultrasound Data, GAS, GAN, LSGAN
PDF Full Text Request
Related items