Font Size: a A A

Research On Speech Separation And Recognition Based On Deep Learning

Posted on:2019-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:S YaoFull Text:PDF
GTID:2428330548492900Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence,a large number of smart devices appear around people.As one of the important ways of human-computer interaction,voice is urgently needed to apply voice signal processing related technologies to smart devices.At present,although the recognition rate of automatic speech recognition system has exceeded humanity,it is limited to a quiet environment and cannot play a corresponding role in the noisy environment of the environment.Therefore,speech separation technology that can remove background noise and other speaker interference becomes a research hotspot.It is emphasized that the end-to-end speech recognition technology that is completed by an algorithm from the input to the output of the task will have more application prospects.The current speech separation and recognition algorithms are based on traditional acoustic features,without much consideration of the impact of signal loss caused by feature extraction and the introduction of false information on system performance.In view of this,using the invariance of convolution can overcome the diversity of speech signal,taking the speech signal as the research object,inquiring into the deep one-dimensional convolution network with the input of speech signal sampling point as the input,the acoustic feature extraction,speech separation and speech recognition.The impact of these three aspects.1.For the traditional acoustic feature extraction process,Fourier transform,discrete cosine transform and other extraction methods caused by the loss of signal high-frequency information and correlation information,this paper designed an acoustic feature extraction model based on deep one-dimensional convolutional network.Overcoming the problems caused by the loss of information and cumbersome module when extracting the traditional acoustic features,can extract the deeper acoustic characteristics of the speech signal,and give experimental verification;2.For the current speech separation system using traditional acoustic features as input,model training can not affect the feature extraction process and other issues.This paper designs a speech separation system that combines one-dimensional convolutional networks and long and short-term memory networks,and extracts acoustic features and models.Combined together,a multi-regression method is used to recover the target speaker's speech from the mixed speech waveform and perform experiments on the dual speaker data set;3.For the current end-to-end speech recognition system with traditional acoustic features as input,and long-term and short-term memory network parameters large,slow operation and other issues,this paper designed an end-to-end speech recognition model based on causal expansion convolution.Using a time-series causal convolution and an extended convolution that can provide larger receptive fields under the same convolutional layer instead of the long-term memory network,build an end-to-end speech recognition system and experiment on a Chinese speech data set.The study found that deep one-dimensional convolutional networks can extract more essential features of speech signals and improve the performance of speech separation systems and speech recognition systems.The successful application of causal expansion convolution in speech recognition is expected to replace the long-term memory network as the best model in the field of speech signals,which also provides a new idea for speech signal processing.
Keywords/Search Tags:Deep one-dimensional convolutional network, Speech raw waveform samples, Feature extraction, Speech separation, Speech recognition
PDF Full Text Request
Related items