| Datong City is located in the northernmost part of Shanxi Province,and its local dialects are an important part of the Jin dialect.The language features of this area are less complex than the dialects in central and southern Shanxi.The research on the speech recognition of the regional dialects can lay a good technical foundation for the research on the speech recognition technology of Shanxi local dialects.This paper first introduces the language characteristics of the Datong dialect and the construction process of the Datong dialect speech data set,which will be applied to the training of the Datong dialect speech recognition model.Datong dialect and Mandarin have great differences in grammar,pronunciation,etc.Compared with Mandarin,they have a more "into" tone.Due to the short sounding of “into tone”,the duration of the audio is shorter,the duration of the audio is shorter,so the spectral range of “into tone” feature in the spectrogram is smaller,making the spectral representation of the voice more complicated.In response to this problem,combined with the structural characteristics of convolutional neural networks,this paper proposes a "multi-core convolutional fusion network(MCFN)" to extract phoneme features of different durations in the spectrogram.This structure can be added before the acoustic model to improve its robustness.This structure can be added before the acoustic model to enhance the robustness of the acoustic model.Besides,this paper also combines the attention mechanism to build an end-to-end Datong dialect speech translation model.The model treats Datong dialect and Mandarin as two different languages.By inputting the speech signal features of the Datong dialect into the end-to-end speech translation model and mapping them into high-dimensional features,and then forming a corresponding relationship with the Chinese Mandarin text,the result is output.This model can directly connect the dialect speech with the Mandarin text,without the dialect text as a transition,reducing the negative impact of the dialect text quality on the model.MCFN and end-to-end speech translation models work together to complete the task of converting Datong dialect speech into Mandarin text,and experiments have proved to be good.The research on the speech recognition technology of Datong dialect can not only broaden the group of speech recognition users,and facilitate the human-computer interaction activities of users with serious accents,but also can be applied to the fields of identity authentication and medical auxiliary diagnosis.Besides,this subject is of great significance to protect the intangible cultural heritage of Shanxi local dialects and enhance barrier-free language communications across the country. |