The Study Of Acoustic-to-articulatory Inversion Based On Generative Adversarial Networks

Posted on:2021-06-11

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Zhang

Full Text:PDF

GTID:2518306548981369

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The implementation of the "Belt and Road" policy has set off a frenzy of "Chinese fever ",however,many foreign learners who regard Chinese as a second language have encountered problems in learning Chinese,such as the single traditional Chinese learning model,the lack of teaching resources and language differences.Human perceptual communication is a multi-sensory process,audio-visual multi-modal information can effectively promote language understanding and learning.Intelligent auxiliary language learning is to provide learners with guidance on the movement of pronunciation organs,enhance learners' understanding of pronunciation,and master the rules of pronunciation more accurately,so as to effectively solve the problem of learning difficulties in Chinese to a certain extent.Considering the current demand and the lack of Chinese intelligent auxiliary language learning under the background of the times,this paper uses the Chinese ultrasonic database to propose a speech inversion(Generative Adversarial Network Acoustic-to-Articulatory Inversion,GAN-AAI)based on generating confrontation network to restore the motion state of the internal vocal organs of the human body from the speech signal,and to use intelligent assistance for language learning.In this paper,acoustic speech data and ultrasonic image data collected from Chinese ultrasonic database are first applied to the study of speech inversion.firstly,the acoustic speech data and ultrasonic image data are preprocessed by using the conventional acoustic normalization method general acoustic space,the acoustic speech data and ultrasonic image data MFCC features are extracted,and the ultrasonic image data use PCA dimensionally to extract the motion features of the vocal organs.The speakerindependent acoustic-to-articulatory inversion based on general acoustic space GAS is realized.Secondly,the LSTM-based generation network and the Res Net-based discrimination network are constructed respectively,and the two networks are combined to form the speech inverse push network(GAN-AAI)to realize the speech inverse push.finally,the least squares loss function is used to replace the cross-entropy loss function in the GAN to improve the traditional GAN.the least squares generation adversarial network(LSGAN)is constructed to realize the speech inversion.The experimental results show that all three models can reflect the motion state of the vocal organs inside the human mouth.GAS,GAN structure similarity is improved compared with the traditional speech inversion model,and the optimal results are obtained LSGAN-AAI,the structure similarity reaches 81.76%,which is 4.1% higher than the GAS-AAI.the root means square error shows that the LSGAN-AAI and GAN-AAI ratio GAS-AAI decrease by 27.35% and 13.48% in error,respectively.In this paper,we use Chinese and Chinese vowel phonological data to verify the inverse-push model,and the results show that the acoustic-to-articulatory inversion model can reflect the motion information of vocal organs accord with the characteristics of human pronunciation.For different speakers with individual differences,the model can reflect the information of the movement of the vocal organs that reflect individual differences.

Keywords/Search Tags:

Acoustic-to-Articulatory, Chinese Ultrasound Data, GAS, GAN, LSGAN

PDF Full Text Request

Related items

1	Acoustic And Articulatory Study Of Ewe Vowels From A Multimodal Speech Database
2	Articulatory Motion Synthesis System Based On Ultrasound And Magnetic Resonance Images
3	Morphological Normalization Of EMA-based Data For Articulatory Speech Recognition
4	Continuous Ultrasound Based Articulatory Movement Synthesis From Speech
5	Research On The Speech Emotion Recognition Fusing Articulatory And Acoustic Features
6	Speaker independent acoustic-to-articulatory inversion
7	A Study On Acoustic-to-articulatory Inversion Based On Feature Transformation Fusion And Attention Mechanism
8	Research And Implementation Of Chinese Speech Synthesis System Based On Articulatory Feature
9	Research On Chinese Tongue Ultrasound Video Conversion Based On DTW And CNN
10	A Study Of The Mapping Between Articulations And Acoustics