Font Size: a A A

Research On Dialect Accent Classification Based On Deep Learning

Posted on:2022-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:C LeiFull Text:PDF
GTID:2518306569481864Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Dialect classification technology can add category tags to audio based on speech content,help build dialect data sets,help my country's dialect protection cause,and is very important in digging into the differences between dialects and the influence of regional dialect accents.Traditional dialect classification methods use a large number of audio features and classifiers such as support vector machines.The design of audio features requires professional domain knowledge and will have a specific bias on the experimental results.The capacity of support vector machine is insufficient to combat the effects of complex scenes and strong noise.There is little work combining deep learning and attention on the topic of dialect classification at present,so this article studies the combination of deep learning and attention methods on this topic.The main work is as follows:(1)Constructing the Guangdong dialect speech data set,which currently includes Mandarin and 3 dialects commonly used in Guangdong,including Cantonese,Chaoshan dialect,and Hakka dialect.The use of self-built voice data sets can effectively avoid problems such as missing research categories and new noise introduced by the collection equipment or environment,and will greatly help the subsequent extension of the subject.(2)Converting the speech signal into sound spectrums for unified characterization,to avoid the problems that may be introduced by manual features,and using audio enhancement and spectrogram enhancement for the characteristics of the speech signal,to deal with the problem of unbalanced distribution of labels in the data set.(3)Migrating the work,the music classification model combined with channel attention mechanism proposed by the predecessors,from the music field to the dialect recognition field,and combine self-attention to propose a dialect classification model based on self-attention,so that the model can better capture the high-level correlation between features,further enhances the model's abstraction ability,and its macro average F1 value reaches 89.77%.The experiment verified the effectiveness of the model,and discussed the feasibility of the combination it with multi-head attention and the possible performance improvement that may take.(4)The channel attention structure with residual gate control will ignore the distribution of features in the spatial domain and occasionally occur gradient explosions,so it cannot well capture the frequency transitions of dialect differences reflected in the spectrogram.This paper proposes a dialect classification model based on cyclic convolutional network combining this structure with hyperbolic tangent activation function and spatial domain attention,which macro average F1 value reaches 91.54%,then compares it with previous work in this field to verify its effectiveness.Finally,this paper designs and implements an audio-based dialect classification system based on the model obtained from the above experiment,which realizes an automatic labeling of dialect categories.
Keywords/Search Tags:Dialect Classification, Data Augmentation, Attention model, Dialect speech data set, Convolutional neural network
PDF Full Text Request
Related items