Research On Whisper To Normal Speech Conversion Based On Convolutional Neural Network

Posted on:2021-04-08

Degree:Master

Type:Thesis

Country:China

Candidate:H L Lian

Full Text:PDF

GTID:2428330629480389

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Whisper refers to the low-energy pronunciation without vocal cord vibration.It is a special and essential communication style between people.For example,in places such as libraries and conference rooms where loud speaking is prohibited,people often use whisper for human-to-human communication or human-computer interaction;And in recent years,whisper has become the one of the most convenient human-computer interface compared to surface electromyography and magnetic resonance imaging interfaces in the field of human-computer interaction.It can be seen that whisper has broad application prospects.Therefore,in recent years,the study of whisper to normal speech conversion(usually expressed by whisper-to-speech conversion)has attracted much attention of researchers.This thesis mainly focuses on whisper-to-speech conversion technology based on convolutional neural networks.The major works are divided into the following two parts:First,according to investigation,it is found that the existing whisper-to-speech conversion methods can not make full use of the time and frequency domain correlation of speech for modeling.When the spectrum of adjacent consecutive speech frames is spliced into matrix,the local correlation in the time and frequency domain dimensions is very similar to the correlation between adjacent pixels in a image.The neurons in the convolutional layer in the Convolutional Neural Network(CNN)are calculated by convolutional calculations of multiple neurons in adjacent areas in the previous layer.At the same time,because the point in a certain area of the previous layer contains the time and frequency domain information of the input voice spectrum,the convolutional layer can extract the time and frequency domain correlation information implied in the voice spectrum characteristics.In order to make full use of the correlation between time domain and frequency domain of speech for modeling,this thesis proposes to use deep convolutional neural network model(DCNN)to realize whisper-to-speech conversion.Experimental results show that the converted speech obtained by the proposed DCNN model is closer to normal speech than that by the DNN model.Secondly,although the DCNN model can make full use of the time-domain and frequency-domain correlation of speech for the modeling of whisper-to-speech conversion,DCNN uses just a fully connected layer to fit the mapping relationship between features extracted by the convolutional layer and normal speech features.Because the fully connected layer treats each frame of input speech features as independent features,so the DCNN cannot further use the temporal correlation to model the features extracted by the convolutional layer.Note that,BLSTM(Bidirectional Long Short-Term Memory)can make a good use of temporal correlation,so in order to make full use of the advantages of CNN and BLSTM,this thesis propose to use Deep Convolutional Recurrent Neural Network(DCRNN)for whisper-to-speech conversion.This method has been verified on a real whisper database,and the experimental results prove that the conversion effect of the method is further improved compared with the DCNN model.

Keywords/Search Tags:

Whisper-to-speech conversion, Correlation between time and frequency domain, DCNN, DCRNN

PDF Full Text Request

Related items

1	Whisper To Speech Conversion And Whisper Recognition Modeling Method
2	Research On Whisper To Normal Speech Conversion Based On Deep Neural Networks
3	Study On The Conversion Of Whispered Speech Into Normal Speech By Feature Mapping
4	Research On Whisper Speech Signal Processing And Its Application In Laser Interception
5	New time-frequency domain pitch estimation methods for speech signals under low levels of SNR
6	Research On Whisper Speech Detection Technology
7	Whisper speech processing: Analysis, modeling, and detection with applications to keyword spotting
8	Research On Key Techniques Of OFDM Systems Under Low SNR Condition
9	Research On Continuous Speech Keyword Recognition Based On Time Domain And Frequency Domain
10	Reconstruction Of Normal Speech From Chinese Whispers Based On MELP