Research On Speech Recognition Technology For Online Education Application

Posted on:2023-07-24

Degree:Master

Type:Thesis

Country:China

Candidate:X K Huang

Full Text:PDF

GTID:2557306827496214

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In recent years,online education is favored by more and more users because of its convenience,intelligence and many other advantages.Relying on intelligent detection tools to assist teaching,online education can provide "personalized" guidance to solve the learning problems of different user groups.In the field of online education,intelligent speech technology plays an important role,such as oral scoring,phonetic transcription in class,online video automatic subtitle generation and other functions,to help teachers and students improve the efficiency of teaching and learning,so that teachers and students more focused on knowledge learning itself.However,the current end-to-end speech recognition technology still has the following shortcomings :(1)the model has poor recognition ability for long-term speech recognition.(2)In the noise reverberation environment,the recognition rate of speech recognition model decreases seriously.(3)the background voice is wrongly recognized by the speech recognition model.The main work and contributions of this paper are as follows:The end-to-end speech recognition model based on Conformer framework and joint CTC training is established and studied to study the influence of progressive down-sampling and multi-scale attention mechanism on long-term speech recognition.Multi-scale attention mechanism combines convolution and self-attention mechanism to learn more speech representations of different scales and has better recognition effect in long speech.Experiments show that the multi-scale attention Conformer proposed in this paper can effectively improve the generalization ability of the model for long time speech recognition scenarios.Aiming at noise environment,we proposed the dual path TFCN speech enhancement model.By using the progressive learning strategy,the amplitude spectrum and the real and virtual components of the signal are modeled respectively,and finally the denoised speech is obtained.This method not only uses the information of amplitude spectrum,but also learns the phase spectrum through real and virtual components,so as to achieve better denoising effect.In addition,the number of parameters in this method is much less than that in Wave Net,Unet network model.Aiming at the situation that the speech recognition model incorrectly recognizes non-target speech due to the background voice,this paper proposes the dual path TFCN target speaker extraction algorithm,which projects the recognized speech and registered speech into the same feature space through the shared audio encoding network,and then obtains the speaker features in the registered speech through multi-task learning.It is processed by speaker attention mechanism and TFCN network to eliminate other background human voice interference.Experiments show that the TFCN speaker extraction algorithm based on time-frequency domain is superior to the mainstream Spex model in the distortion evaluation indexes such as SI-SDR and SDRTo sum up,for the recognition of long speech,environmental noise,background voice and other complex situations that the speech recognition system may encounter in online education scenes,this paper proposes a feasible,lightweight and robust combination of speech recognition models,which can effectively solve the recognition problem in complex scenes.

Keywords/Search Tags:

Deep learning, speech recognition, speech enhancement, multi-task learning, online education

PDF Full Text Request

Related items

1	Multi-channel Convolutional Classroom Speech Emotion Recognition Based On Attention Mechanism
2	Research On Key Technologies Of Ai-assisted Online Education
3	Campus Uncivilized Speech Recognition Based On Deep Learning
4	Improved Research On Speech Emotion Recognition Based On Phonological Representation
5	Research On Teacher’s Speech Emotion Recognition Based On Deep Learning
6	Research On The Construction Of Bilingual Speech Assisted Learning System For Minority Students Based On Speech Recognition Technology
7	Video Course Portrait Research Based On Seq2Seq Structure
8	Research On Old People’s Chatting Robot Based On Deep Learning
9	Design And Implementation Of Speech Interaction System For Educational Robot
10	The Method And Technology Of Teaching System Based On Face Recognition And Speech Recognition