| Speech recognition technology is a natural language processing technique that converts speech into text,to enable human-machine interaction.As one of the minority languages in China,Tibetan language speech recognition has been researched relatively late,and its’ corpus size is also small.Traditional speech recognition models have not performed well in Tibetan language speech recognition tasks.With the development of neural network technology,the end-to-end speech recognition model based on this technology have achieved significant results in the field of speech recognition.However,the end-to-end technology requires a large amount of training data,and there are few publicly available Tibetan corpus available today.In recent years,transfer learning methods have received widespread attention from researchers and have achieved significant results in fields such as image processing and natural language processing.This paper introduces transfer learning methods into low-resource language speech recognition work,using a large number of Chinese and English language materials to compensate for the lack of training data for Lhasa-Tibetan speech recognition.Therefore,this paper mainly studies the application of cross-language transfer learning methods based on end-to-end technology in Lhasa-Tibetan speech recognition tasks,and further improves the accuracy of speech recognition by using a text correction model based on the transfer model.The main research content and contributions of this paper are as follows:1.Research on Cross-language Transfer Learning for Lhasa-Tibetan End-to-End Speech Recognition based on multi-language character setBased on the WaveNet-CTC end-to-end model,this paper explores the transfer learning method for Lhasa-Tibetan speech recognition models by leveraging Chinese or English speech data.This paper proposes a multi-language character set as the modeling unit,and compared with modeling units such as Latin letter set.The transfer learning model using this method can effectively utilize the potential similarity between multi-language speech recognition tasks,compensate for the insufficient training data of Lhasa-Tibetan,and improve the performance of Lhasa-Tibetan speech recognition systems.In terms of the source language selection for transfer learning,Chinese is more suitable than English as the source language for cross-language transfer learning of Tibetan speech recognition models.2.Research on Cross-language Transfer Learning for Lhasa-Tibetan End-to-End Speech Recognition based on universal phoneme setSince the decoding part of the WaveNet-CTC model cannot consider the context information,this paper adopts the Conformer model in the recognition model structure.Conformer is a Transformer model enhanced with convolutional layers.It enhances the expression ability of local features,which can better model the local and global features in speech sequences and simplify the parameter size as much as possible.Based on the feature expression ability of the Conformer model,this paper proposes a cross-language universal phoneme set based on Chinese and Tibetan as the modeling unit.Compared with the above WaveNet-CTC model experiments,the speech recognition system based on the universal phoneme set and the Conformer model is generally superior to the speech recognition system based on the WaveNet-CTC model,which can further improve the performance of the Lhasa-Tibetan speech recognition system.3.Research on Lhasa-Tibetan Text Correction based on Soft-Masked BertText correction tasks can correct spelling errors in text at the word or character level.The excellent performance of the above experimental results is represented by word-level or character-level error rates,so building a text correction program is of great significance.Soft-masked Bert model is an improvement and extension of the Bert model,which has achieved good applications only in the field of Chinese text correction,Therefore,this paper uses this model to further correct the Tibetan language speech recognition results,improve the rationality of the recognition text,and improve the cross-language Lhasa-Tibetan speech recognition system based on transfer learning.The experimental results show that the model can further improve the performance of the Lhasa-Tibetan speech recognition system based on Tibetan characters as modeling units. |