Font Size: a A A

Research On Accent Recognition Method Based On Deep Learning

Posted on:2022-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2518306542963459Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Accent identification with deep learning framework is a similar work to deep speaker identification,they're both expected to give the input speech an identifiable utterance-level representation.Compared with the individual-level features learned by speaker identification network,the deep accent recognition work throws a more challenging point that forging group-level accent features for the same accent speakers.Concretely,this challenge is reflected in:(1)under the limited training-data,due to the number of accents is far less than the number of speakers and the input speech contains rich and varied signals,fast convergence under this petty classification target is not equivalent to obtain accurate accent representation,the generalization of accent features may be poor.(2)In many formal social situations,speakers tend to adopt standardized pronunciation(e.g.the mandarin in Chinese),this pronunciation will narrow the differences between different accents and may lead to insufficient discrimination of accent features.In this thesis,taking the 2D speech spectrogram as input,we borrow and improve the deep paradigm in the speaker identification to design a deep accent identification network,as a result,the proposed framework is significantly ahead of the baseline system on the accent classification track in the Accented English Speech Recognition Challenge 2020.The specific research work of this thesis is as follows:(1)Building utterance-level accent representation.Convolutional Recurrent Neural Network(CRNN)is used as the front-end encoder to extract the local features(descriptors)of the input speech,where the CNN subpart and RNN subpart use Res Net and bidirectional GRU respectively.In view of the temporal relationship between the extracted local features,a many-to-one feature integration method based on RNN is proposed to generate global accent features.(2)For the problem of insufficient generalization of accent features,a multitask learning(MTL)method is proposed to improve accent representation.MTL improves the representation power of the shared hidden layers by adding additional related tasks as constraints in the training process.Considering that accent is a timbre attribute related to speech,this thesis introduces automatic speech recognition(ASR)auxiliary tasks based on CTC(Connectionist Temporal Classification)objective in the training process to improve accent feature learning.(3)For the problem of insufficient discriminability of accent features,the discriminative feature learning method is studied to enhance the discriminative power of accent features.Since the traditional softmax-based classification task pursues a probability space that satisfies the correct classification of all categories,it does not mean that it is a highly discriminative metric space.Hence,this thesis introduces related feature learning works in the field of face recognition to improve the ambiguous representation in the accent feature space.that is,maximizing inter-class variance and minimizing intra-class variance.Loss function plays a vital role in deep feature learning.This thesis studies and compares some discriminative variants of softmax loss function in face recognition work:(a)Cos Face,(b)Arc Face,and(c)Circle-Loss.at the same time,examines the selection of hyperparameter margin for discriminative loss.In addition,a visual analysis of discriminative loss to improve the discriminative power of features is given in the 2D/3D feature space.
Keywords/Search Tags:Accent identification, Speaker identification, Deep feature learning, Multitask learning, Automatic speech recognition
PDF Full Text Request
Related items