A Study On Acoustic-to-articulatory Inversion Based On Feature Transformation Fusion And Attention Mechanism

Posted on:2021-02-07

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Wang

Full Text:PDF

GTID:2518306548981869

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Acoustic-to-articulatory inversion(AAI)is the research of conversing the movement of articulators based on speech signals.It has great application value in language learning and rehabilitation guidance.Most of current work only use speech features as input,causing an inevitable performance bottleneck.And since the framework with bidirectional recurrent neural network was proposed,there has been very little progress on the development of inversion framework.To solve this problem,in this work,we propose a novel method called auxiliary feature fusion network(AFFN).This paper mainly studies the feature processing and inversion network in acoustic-to-articulatory inversion.In terms of input features,we use the trajectory of the non-tongue positions in the EMA dataset as an auxiliary feature to enhance the diversity of input feature together with speech features.And then,the input feature is kept unique and an extraction unit is used to predict the auxiliary features to enhance the performance of inversion network.At the same time,inspired by the idea of feature fusion based on canonical correlation analysis,we propose a feature transformation module to generate a joint feature with higher correlation as the input of the articulatory inversion module.Finally,the encoder-decoder network with attention mechanism is used to replace the general multi-layer LSTM network to add more the context relation.Experiments are conducted on two public datasets,namely mngu0 and MOCHA.The effectiveness of our proposed methods are verified.Experimental results show that the proposed acoustic-to-articulatory inversion model with feature transformation fusion and attention mechanism can greatly improve the performance with the same input speech feature,reducing the average RMSE by more than 15%,compared with the state-of-the-art method using the audio speech feature only.

Keywords/Search Tags:

Acoustic-to-articulatory inversion, DBLSTM, Auxiliary feature, Feature transformation fusion, Attention mechanism

PDF Full Text Request

Related items

1	Multi-Modal Acoustic-to-Articulatory Inversion Based On Speech Decomposition And Auxiliary Feature
2	Research On The Speech Emotion Recognition Fusing Articulatory And Acoustic Features
3	The Study Of Acoustic-to-articulatory Inversion Based On Generative Adversarial Networks
4	Speaker independent acoustic-to-articulatory inversion
5	Research On Dual-modal Anti-noise Feature Extraction Of Fuzzy Speech
6	Feature Fusion Based On Main-auxiliary Network For Speech Emotion Recognition
7	The Multi-scale Fusion Of Acoustic Scene Classification Based On Attention Mechanism
8	A Lightweight Deep Learning Target Detection Algorithm Based On Multiscale Feature Fusion
9	Non-local Attention Mechanism And Multi-Supervised Feature Aggregation Block Fusion Network For Salient Object Detection
10	Research On Environmental Sound Recognition Technology Based On Feature Fusion And Soft Attention Mechanism