Font Size: a A A

Research On Acoustic Model Of Speech Recognition In Educational Scene Based On Deep Learning

Posted on:2020-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:G J WangFull Text:PDF
GTID:2428330596987365Subject:EngineeringˇComputer Technology
Abstract/Summary:PDF Full Text Request
In recent years,speech recognition technology has undergone great technological changes with the development of the field of deep learning,in which the acoustic model has gradually developed from the traditional Gaussian hybrid model to the neural network model,which has significantly improved the recognition performance of speech recognition system,so that speech recognition can better serve the industrial production and daily life of human beings.However,in the educational scene,it is still a challenging research to apply speech recognition to educational scene because of the problems of comprehensive teaching content,rich and varied expression,and relative scarcity of speech data set in educational scene.Aiming at realizing speech recognition in educational scene,this paper first studies the acoustic models based on deep neural network,delay neural network,bidirectional long-time memory network and deep feedforward sequence memory neural network,and compares the modeling performance of acoustic models of different network structures on speech datasets in educational field through experiments.Experimental results show that the acoustic model based on depth feedforward sequence Memory neural network is superior to other network structures in recognition performance and parameter scale,and is suitable for the realization of speech recognition system in educational scene.On this basis,combined with the characteristics of educational scene speech recognition,this paper optimizes the recognition performance of acoustic model based on deep feedforward sequence Memory neural network by differentiating training method,speaker adaptive method and speech data amplification method.The experimental results show that the above three methods can improve the recognition performance of the acoustic model of speech recognition in educational scene to a certain extent,and through the fusion of these three methods,the error rate of recognition words can be further reduced.In addition,this paper also proposes a speech data amplification method based on automatic annotation to solve the problem of high cost of speech data acquisition in educational scene.Experimental results show that the proposed method can effectively expand the scale of speech data that can be used for training,improve speech recognition performance,and has the advantages of stable,reliable and strong maneuverability.Finally,by fusing the above four optimization methods,the optimal recognition results are obtained on the test set,and the word error rate is 29.8% relative to the baseline model,which basically satisfies the actual demand of the educational scene for the performance of speech recognition acoustic model.
Keywords/Search Tags:Deep learning, Speech recognition, Acoustic model, Discriminative training, Speaker adaptation, Data augmentation
PDF Full Text Request
Related items