Research On Speech Recognition Based On Compound Two-way Cyclic Network Under Specific Working Conditions

Posted on:2019-10-28

Degree:Master

Type:Thesis

Country:China

Candidate:D Zhou

Full Text:PDF

GTID:2438330548971247

Subject:(degree of mechanical engineering)

Abstract/Summary:

With the rapid development of computer technology,speech recognition systems based on neural networks are widely used in various fields.For the time sequence of speech,the recurrent network has a unique advantage.In a LSTM,different door modules are used to control the inflow and outflow of information,which solves the problem of gradient explosion and disappearance during training.The deep,bidirectional,and compound model structures,in the era of today’s GPU computing capabilities,have been proved to bring the state-of-the-art performance on non-linear simulation capabilities that rely heavily on sequences of timing information.In this paper,a composite structure based on a bidirectional LSTM network is proposed and compared with mature speech recognition systems to show the recognition effect.At the same time,an objective function construction method suitable for such a composite structure network is reducted.The objective function is then compared to verify the improvement of its recognition effect.In this paper,firstly,popular neural network-based speech recognition systems is introduced by introducing the traditional speech recognition system.After analyzing and comparing the effects of three kinds of speech recognition systems,a composite structure based on LSTM network with two sub-networks was proposed to solve the problems of the coupling between the simple nonlinear input of speech and the target output,which may be encountered.After a composite structure is being proposed and the possible problems of directly constructing the objective function is being considered,an attempt was made to insert an ‘none’ symbol into the target output.During training,it is only necessary to find all the ways of segmentation and to maximize the probability of occurrence of these modes.Therefore,it is not necessary to correspond each frame to the corresponding target,which greatly simplifies the calculation.In the process of theoretical reasoning,front-end variables and back-end variables are defined to be used to represent such segmentation ideas.Finally,the objective function is derived and the corresponding gradient formula is derived.The main task of this paper is to complete speech recognition under specific working conditions.After analyzing the noise environment of the factory,specific noise phonemes are extracted.Then use mixing software to mix the specific noise phonemes into the training and testing datasets,but leave the target sequence behind.Therefore,the system is undergone training and testing in the noise environment of the factory.In the end,with Tensorflow deep learning module and librosa voice module,corresponding program is coded.Since the objective functions used for comparison are all based on TIMIT corpora as in training and testing phases.In that case,use TIMIT for training and testing.The final test results verify that the proposed objective function recognition effect has improved significantly.

Keywords/Search Tags:

Recurrent Neural Network, Speech recognition, Model structure, Loss function

Related items

1	Research On Sensor Activity Recognition Based On Improved Deep Recurrent Neural Network
2	Research And Implementation Of Speech Recognition Algorithm Based On Recurrent Neural Network
3	Uyghur Speech Recognition Based On Deep Recurrent Neural Network
4	Research On Speech Emotion Recognition Based On Convolutional Recurrent Neural Network
5	Research On Speech Emotion Recognition Algorithm Based On Deep Learning
6	Research On Acoustic Modeling Of Speech Recognition Based On Recurrent Neural Network
7	Recurrent Neural Network Language Model For Continuous Speech Recognition
8	Research And System Design Of Speech Recognition Based On Improved CNN
9	Deep Face Recognition Method Based On Sptio-Temporal Feature Fusion And Adaptive Loss Function
10	Speech Emotion Recognition Based On Improved Convolutional Recurrent Neural Network