Font Size: a A A

Research On Bone-conducted Speech Enhancement Based On Generative Adversarial Network

Posted on:2023-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q PanFull Text:PDF
GTID:2568307043488844Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Air-conducted speech is easily interrupted by environment noise,resulting in low intelligibility of the received speech.Recording and transmission of bone-conducted speech signals via non-acoustic sensors close to the skull or throat is an effective way to avoid noise interference.However,bone-conducted speech loses high-frequency components,making consonant syllables such as fricatives and plosives related to high frequencies absent,resulting in dull sound and incomplete semantic information.The aim of bone-conducted speech enhancement is to improve the speech quality of bone-conducted speech through making up the absent high-frequency components.This thesis focuses on studying of the bone-conducted speech enhancement.The major works are as follows:Firstly,a cycle-consistent adversarial networks was proposed for bone-conducted speech enhancement.The generator downsamples the bone-conducted speech features for feature map compressing,the compressed features were converted by residual connections,and the transformed feature map was upsampled to generate air conduction-like speech features.The generator was trained combined with a discriminator in a game style,making the generated speech feature as similar as possible to the real air-conducted speech.The experimental results show that the proposed method exhibits good performance on reconstructing the high frequency components of bone-conducted speech.Secondly,in order to solve the over-smoothing issue of conventional cycle-consistent adversarial networks for bone-conducted speech enhancement,a dual adversarial loss cycleconsistent adversarial networks based on bone-conducted speech enhancement model was proposed.The class adversarial loss is used for adversarial constraints of speech class(boneconducted speech or air-conducted speech)and the defect adversarial loss was adopted for characterizing spectral distance between the generate speech and the real air-conducted speech.The proposed model was trained without time-alignment of train data,and can avoid the oversmoothing issue.The experimental results show that the proposed model can obtain Melcepstral features with higher similarity to the real air-conducted speech,and efficiently improve the speech quality of the bone-conducted speech.
Keywords/Search Tags:Bone-conducted speech enhancement, Cycle-consistent adversarial networks, Dual adversarial loss
PDF Full Text Request
Related items