Research On Acoustic Model Of Speech Recognition In Educational Scene Based On Deep Learning

Posted on:2020-09-13

Degree:Master

Type:Thesis

Country:China

Candidate:G J Wang

Full Text:PDF

GTID:2428330596987365

Subject:Engineering·Computer Technology

Abstract/Summary:

PDF Full Text Request

In recent years,speech recognition technology has undergone great technological changes with the development of the field of deep learning,in which the acoustic model has gradually developed from the traditional Gaussian hybrid model to the neural network model,which has significantly improved the recognition performance of speech recognition system,so that speech recognition can better serve the industrial production and daily life of human beings.However,in the educational scene,it is still a challenging research to apply speech recognition to educational scene because of the problems of comprehensive teaching content,rich and varied expression,and relative scarcity of speech data set in educational scene.Aiming at realizing speech recognition in educational scene,this paper first studies the acoustic models based on deep neural network,delay neural network,bidirectional long-time memory network and deep feedforward sequence memory neural network,and compares the modeling performance of acoustic models of different network structures on speech datasets in educational field through experiments.Experimental results show that the acoustic model based on depth feedforward sequence Memory neural network is superior to other network structures in recognition performance and parameter scale,and is suitable for the realization of speech recognition system in educational scene.On this basis,combined with the characteristics of educational scene speech recognition,this paper optimizes the recognition performance of acoustic model based on deep feedforward sequence Memory neural network by differentiating training method,speaker adaptive method and speech data amplification method.The experimental results show that the above three methods can improve the recognition performance of the acoustic model of speech recognition in educational scene to a certain extent,and through the fusion of these three methods,the error rate of recognition words can be further reduced.In addition,this paper also proposes a speech data amplification method based on automatic annotation to solve the problem of high cost of speech data acquisition in educational scene.Experimental results show that the proposed method can effectively expand the scale of speech data that can be used for training,improve speech recognition performance,and has the advantages of stable,reliable and strong maneuverability.Finally,by fusing the above four optimization methods,the optimal recognition results are obtained on the test set,and the word error rate is 29.8% relative to the baseline model,which basically satisfies the actual demand of the educational scene for the performance of speech recognition acoustic model.

Keywords/Search Tags:

Deep learning, Speech recognition, Acoustic model, Discriminative training, Speaker adaptation, Data augmentation

PDF Full Text Request

Related items

1	Speaker Adaptation Of DNN-HMM Acoustic Model For Speech Recognition
2	Research On Acoustic Modeling For Spontaneous Spoken Speech Recognition
3	Research On Speaker Adaptation Methods Based On RNN-BLSTM Acoustic Model
4	Application Of DT And DT-Adaptation In Acoustic Modeling Of ASR
5	Study Of Speaker Adaptation Based On Neural Network Acoustic Model
6	The Study On Acoustic Model Based Neural Netword In Mongolian Speech Recognition System
7	Research On Discriminative Techniques Of Feature Extraction And Acoustic Model Training In Continuous Speech Recognition
8	Discriminative Training Of Acoustic Models For Automatic Speech Recognition
9	Speech Emotion Recognition With Deep Learning Techniques And Data Augmentation
10	Research On Speaker Adaptation Of Neural Network Acoustic Models For Speech Recognition