Research On The Conversion Of Bone Conduction Speech To Normal Speech Based On Deep Learning

Posted on:2022-06-06

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Chu

Full Text:PDF

GTID:2518306335485164

Subject:Master of Engineering (in the field of computer technology)

Abstract/Summary:

Voice,as the most convenient,direct,and effective way of information interaction between people,has always been disturbed by the noise of the surrounding environment.Common speech enhancement technologies can effectively reduce the noise of noisy speech,thereby improving the quality of speech.However,if it is in an environment with strong noise such as squares,docks,or even battlefields,the enhancement effect of existing speech enhancement technologies will be significantly reduced.Bone-conducted speech is obtained by collecting vibration signals generated on the surface of the larynx,mastoid behind the ear,temples,and top of the skull when a person speaks.Compared with the normal sound(also known as air-conducted speech)that is transmitted through the air,bone-conducted speech is transmitted by the bones and tissues of the human body,shielding the noise from the surrounding environment from the root.However,boneconducted speech sounds dull,with poor speech clarity and intelligibility,and cannot be used directly by people like normal speech.Converting bone-conducted speech to normal sound has a wider application prospect,so the research of bone-conducted speech conversion has attracted much attention in recent years.This paper mainly studies boneconducted speech conversion technology based on deep learning.The specific content is mainly as follows:(1)Establish a bone-conducted phonetic database in MandarinA set of Mandarin speech databases including air-conducted speech and boneconducted speech was established using the standard mode to provide real data support for the experiment.The recording corpus of the speech database comes from 500 representative sentences in sports,art,life,etc.carefully selected from the BCC corpus established by Beijing Language and Culture University.The phonetic database consists of2 males and 2 females,a total of 4 Mandarin-pronunciation personnel who simultaneously recorded 320 bone-conducted and normal sounds.At the same time,the effectiveness of the bone-conducted speech database of Mandarin Chinese is verified by mutual information analysis model experiments.(2)Propose a bone-conducted speech conversion method based on deep learningBidirectional Long Short-Term Memory(BLSTM)can extract the time-domain correlation of time series.The neurons in the convolutional layer in the Convolutional Neural Network(CNN)are composed of the previous layer.The multiple neurons in the adjacent area in the middle position are calculated by convolution.For speech feature parameters that include two dimensions in the time domain and frequency domain,the convolutional layer can simultaneously extract the time domain and the implicit time domain of the speech feature parameters.Frequency domain correlation information.To make full use of the time domain and frequency domain correlation of speech for modeling,a new method based on deep bidirectional long and short-term memory-deep convolutional neural network(DBLSTM-DCNN)is proposed.A speech conversion model from bone conduction speech to air conduction speech.The experimental results show that the converted speech obtained by using the DBLSTM-DCNN model is closer to the normal tone than the Deep neural network(DNN)model and the BLSTM model.(3)Propose a bone-conducted speech conversion algorithm based on feature fusionMel Cepstrum Coefficient(Mel Frequency Cepstral Coefficient,MFCC)can realize that its speech contains many different speech feature parameters,which correspond to different physical and acoustic meanings,and there may be some complementarity between different features..The characteristics that are more in line with human hearing perception are often used in speech signal analysis technology.To use the complementarity between the spectrum envelope extracted by the MFCC and the WORLD module and the non-complementary stockpiles,a bone-conducted speech conversion model based on feature fusion is proposed.The experimental results show that the bone-conducted speech conversion model using feature fusion is selected for use A bone-conducted voice conversion model with a single feature can get a better conversion effect.

Keywords/Search Tags:

bone-conducted speech conversion, deep convolutional neural network, deep bidirectional long and short-term memory network, feature fusion

Related items

1	Research And Application Of Speech Emotion Recognition Algorithm Based On Deep Learning
2	Research On Network Intrusion Detection Method Based On Bi-LSTM
3	Speaker Emotional State Recognition Based On Speech And Text Fusion
4	Amdo Tibetan Speech Recognition Based On Deep Neural Network
5	Research On Algorithms Of Speech Sentence Recognition Based On Deep Learning
6	Research On Event Extraction Based On Deep Learning
7	Research On Text Sentiment Analysis Based On Deep Learning
8	Speech Enhancement Based On Optimized Full Convolution And Long-short Term Memory Network
9	Research On Short Text Emotional Tendency Analysis Based On Deep Learning
10	Research And Implementation Of Chinese Textual Entailment Recognition Based On Deep Neural Network