Research On Dialect Accent Classification Based On Deep Learning

Posted on:2022-03-11

Degree:Master

Type:Thesis

Country:China

Candidate:C Lei

Full Text:PDF

GTID:2518306569481864

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Dialect classification technology can add category tags to audio based on speech content,help build dialect data sets,help my country's dialect protection cause,and is very important in digging into the differences between dialects and the influence of regional dialect accents.Traditional dialect classification methods use a large number of audio features and classifiers such as support vector machines.The design of audio features requires professional domain knowledge and will have a specific bias on the experimental results.The capacity of support vector machine is insufficient to combat the effects of complex scenes and strong noise.There is little work combining deep learning and attention on the topic of dialect classification at present,so this article studies the combination of deep learning and attention methods on this topic.The main work is as follows:(1)Constructing the Guangdong dialect speech data set,which currently includes Mandarin and 3 dialects commonly used in Guangdong,including Cantonese,Chaoshan dialect,and Hakka dialect.The use of self-built voice data sets can effectively avoid problems such as missing research categories and new noise introduced by the collection equipment or environment,and will greatly help the subsequent extension of the subject.(2)Converting the speech signal into sound spectrums for unified characterization,to avoid the problems that may be introduced by manual features,and using audio enhancement and spectrogram enhancement for the characteristics of the speech signal,to deal with the problem of unbalanced distribution of labels in the data set.(3)Migrating the work,the music classification model combined with channel attention mechanism proposed by the predecessors,from the music field to the dialect recognition field,and combine self-attention to propose a dialect classification model based on self-attention,so that the model can better capture the high-level correlation between features,further enhances the model's abstraction ability,and its macro average F₁ value reaches 89.77%.The experiment verified the effectiveness of the model,and discussed the feasibility of the combination it with multi-head attention and the possible performance improvement that may take.(4)The channel attention structure with residual gate control will ignore the distribution of features in the spatial domain and occasionally occur gradient explosions,so it cannot well capture the frequency transitions of dialect differences reflected in the spectrogram.This paper proposes a dialect classification model based on cyclic convolutional network combining this structure with hyperbolic tangent activation function and spatial domain attention,which macro average F₁ value reaches 91.54%,then compares it with previous work in this field to verify its effectiveness.Finally,this paper designs and implements an audio-based dialect classification system based on the model obtained from the above experiment,which realizes an automatic labeling of dialect categories.

Keywords/Search Tags:

Dialect Classification, Data Augmentation, Attention model, Dialect speech data set, Convolutional neural network

PDF Full Text Request

Related items

1	Research On Chinese Dialect Recognition Based On Attention And Transfer Learning
2	Application Research Of Deep Learning In Speech Recognition Of Sichuan Dialect
3	Research On Dialect Classification Based On Convolutional Neural Networks
4	Research On Acoustic Analysis And Speech Synthesis For Lanzhou-Dialect
5	Automatic dialect classification: Advances for read and spontaneous speech, and printed text
6	Research On Speech Synthesis Of Shanghai Dialect Based On Deep Learning
7	Research On Speech Recognition Technology And Application Of Local Dialect In Datong,Shanxi
8	The Design And Implementation Of The Speech Synthesis System Of Minnan Dialect
9	Research On Lanzhou-Dialect Speech Generation
10	Acoustic modeling and speaker normalization strategies with application to robust in-vehicle speech recognition and dialect classification