Research On Bone-conducted Speech Enhancement Based On Generative Adversarial Network

Posted on:2023-04-14

Degree:Master

Type:Thesis

Country:China

Candidate:Q Pan

Full Text:PDF

GTID:2568307043488844

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Air-conducted speech is easily interrupted by environment noise,resulting in low intelligibility of the received speech.Recording and transmission of bone-conducted speech signals via non-acoustic sensors close to the skull or throat is an effective way to avoid noise interference.However,bone-conducted speech loses high-frequency components,making consonant syllables such as fricatives and plosives related to high frequencies absent,resulting in dull sound and incomplete semantic information.The aim of bone-conducted speech enhancement is to improve the speech quality of bone-conducted speech through making up the absent high-frequency components.This thesis focuses on studying of the bone-conducted speech enhancement.The major works are as follows:Firstly,a cycle-consistent adversarial networks was proposed for bone-conducted speech enhancement.The generator downsamples the bone-conducted speech features for feature map compressing,the compressed features were converted by residual connections,and the transformed feature map was upsampled to generate air conduction-like speech features.The generator was trained combined with a discriminator in a game style,making the generated speech feature as similar as possible to the real air-conducted speech.The experimental results show that the proposed method exhibits good performance on reconstructing the high frequency components of bone-conducted speech.Secondly,in order to solve the over-smoothing issue of conventional cycle-consistent adversarial networks for bone-conducted speech enhancement,a dual adversarial loss cycleconsistent adversarial networks based on bone-conducted speech enhancement model was proposed.The class adversarial loss is used for adversarial constraints of speech class(boneconducted speech or air-conducted speech)and the defect adversarial loss was adopted for characterizing spectral distance between the generate speech and the real air-conducted speech.The proposed model was trained without time-alignment of train data,and can avoid the oversmoothing issue.The experimental results show that the proposed model can obtain Melcepstral features with higher similarity to the real air-conducted speech,and efficiently improve the speech quality of the bone-conducted speech.

Keywords/Search Tags:

Bone-conducted speech enhancement, Cycle-consistent adversarial networks, Dual adversarial loss

PDF Full Text Request

Related items

1	An End-to-end Bone-conducted Speech Enhancement Method Based On Generative Adversarial Networks
2	Speech Enhancement Based On Linear Prediction And Generative Adversarial Network For Bone-conducted Speech
3	Research On Facial Expression Recognition Based On Generative Adversarial Networks
4	Research On Speech Enhancement Model Based On Improved Generative Adversarial Networks
5	Research On Speech Enhancement Method Based On Generative Adversarial Networks
6	Research On Speech Enhancement Methods Based On Generative Adversarial Networks
7	Multimodal Cycle-consistent Zero-Shot Learning Based On Unbiased Embedding
8	Single Channel Speech Enhancement Based On Generative Adversarial Networks
9	Research On Single-Channel Speech Enhancement Based On Generative Adversarial Network
10	Underwater Image Enhancement Based On Cycle-Consistent Adversarial Network