Font Size: a A A

Research On Speech Recognition Based On Compound Two-way Cyclic Network Under Specific Working Conditions

Posted on:2019-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:D ZhouFull Text:PDF
GTID:2438330548971247Subject:Mechanical engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology,speech recognition systems based on neural networks are widely used in various fields.For the time sequence of speech,the recurrent network has a unique advantage.In a LSTM,different door modules are used to control the inflow and outflow of information,which solves the problem of gradient explosion and disappearance during training.The deep,bidirectional,and compound model structures,in the era of today's GPU computing capabilities,have been proved to bring the state-of-the-art performance on non-linear simulation capabilities that rely heavily on sequences of timing information.In this paper,a composite structure based on a bidirectional LSTM network is proposed and compared with mature speech recognition systems to show the recognition effect.At the same time,an objective function construction method suitable for such a composite structure network is reducted.The objective function is then compared to verify the improvement of its recognition effect.In this paper,firstly,popular neural network-based speech recognition systems is introduced by introducing the traditional speech recognition system.After analyzing and comparing the effects of three kinds of speech recognition systems,a composite structure based on LSTM network with two sub-networks was proposed to solve the problems of the coupling between the simple nonlinear input of speech and the target output,which may be encountered.After a composite structure is being proposed and the possible problems of directly constructing the objective function is being considered,an attempt was made to insert an ‘none' symbol into the target output.During training,it is only necessary to find all the ways of segmentation and to maximize the probability of occurrence of these modes.Therefore,it is not necessary to correspond each frame to the corresponding target,which greatly simplifies the calculation.In the process of theoretical reasoning,front-end variables and back-end variables are defined to be used to represent such segmentation ideas.Finally,the objective function is derived and the corresponding gradient formula is derived.The main task of this paper is to complete speech recognition under specific working conditions.After analyzing the noise environment of the factory,specific noise phonemes are extracted.Then use mixing software to mix the specific noise phonemes into the training and testing datasets,but leave the target sequence behind.Therefore,the system is undergone training and testing in the noise environment of the factory.In the end,with Tensorflow deep learning module and librosa voice module,corresponding program is coded.Since the objective functions used for comparison are all based on TIMIT corpora as in training and testing phases.In that case,use TIMIT for training and testing.The final test results verify that the proposed objective function recognition effect has improved significantly.
Keywords/Search Tags:Recurrent Neural Network, Speech recognition, Model structure, Loss function
PDF Full Text Request
Related items