Speech recognition technology is an important technology driving the development of intelligent oral evaluation,which can transform the content in audio into text form that computers can understand.With the continuous development and widespread application of speech recognition technology,there is a shadow of speech recognition technology in many scenes of human life,which has brought great convenience to human life.However,in the field of education,especially in areas such as intelligent oral evaluation,research on speech recognition applications is not yet mature.And manual grading is still used in grading,which has problems of low efficiency and high cost.The speech recognition task in the context of oral evaluation has a high demand for recognition accuracy,while the audio data in the context of oral evaluation for the English CET-4 and CET-6 is significantly different from existing public datasets.Moreover,due to the differences in the speed,intonation,and pronunciation characteristics of candidates themselves,the audio data in the dataset has diversity,and factors such as noise and stumbling among candidates result in poor quality of the dataset.This has brought great challenges to speech recognition tasks in oral evaluation scenarios.This thesis focuses on speech recognition tasks in the context of the English CET-4 and CET-6 oral exam.By collecting audio data from real CET-4 and CET-6 oral exam rooms,a dedicated dataset is constructed.Based on this,speech recognition methods in oral evaluation scenarios are studied,providing the possibility for future intelligent oral evaluation.The main work of this thesis is as follows:A speech recognition method based on CTC and feature enhancement has been proposed.Due to the high time cost of dataset construction and the high demand for human resources,in order to obtain a model with good recognition performance with limited dataset size,a feature enhancement module is proposed in this thesis,which enhances the robustness of the model by using three methods:time warping,frequency masking,and time masking.At the same time,in order to fully utilize the information in audio data,the feature encoding part combines 2D CNN and GRU to model local and contextual information respectively,which also improves the modeling ability of the model for audio information.Finally,training and testing experiments are conducted on the corresponding dataset,and comparison experiments with mainstream methods and ablation experiments inside the model are carried out.The content of recognition errors is analyzed.The experimental results show that the speech recognition method based on CTC and feature enhancement proposed in this thesis can achieve good recognition performance.A speech recognition method based on attention and text error correction.Considering that the results of speech recognition cannot be 100%accurate,and there are many spelling errors in the identified parts,a text correction module is proposed which uses editing distance and frequency information to correct misspelled words.In order to accelerate the training of the model,this thesis uses self-attention mechanism to model audio context information,solving the problem of GRUs cannot be calculated in parallel.At the same time,in order to fully combine the advantages of different decoding methods,a decoding method based on attention mechanism and CTC is proposed in the decoding section.Finally,comparison experiments are conducted with speech recognition methods based on CTC and feature enhancement,as well as mainstream methods on the corresponding dataset.Comparison and ablation experiment results show that the speech recognition method based on attention and text error correction can achieve relatively better recognition performance.In this thesis,a speech recognition system is designed and implemented.By uploading audio data and selecting the two speech recognition methods proposed in this thesis,the audio data can be recognized,which can assist the grading teacher in oral evaluation,and ensure the fairness and impartiality of the oral evaluation process.Also,it can provide assistance in promoting intelligent oral evaluation. |