Font Size: a A A

Chinese Accent Detection Based On GNN And BiLSTM

Posted on:2022-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:J W SongFull Text:PDF
GTID:2518306779968909Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
In human-machine speech interaction,the processing of prosodic features,including accent,tone and intonation,is critical,and accent is one of the most important prosodic features.On the one hand,speech accent makes the speech synthesized by machine more natural and emotional.On the other hand,it can also avoid the ambiguity of sentences in semantic understanding.Therefore,the detection of speech accent has important research significance.At present,the research on chinese speech accent detection has found the following problems: Firstly,researchers usually only use shallow text features,without taking into account the relationship between the deep text features and speech accent.Secondly,there is no corpus with chinese speech accent labels.Finally,researchers ignore the potential relationship between speech accent and other prosodic events,and usually split the prosodic events into analysis in speech accent detection studies.To solve the problems in chinese speech accent detection,a chinese speech accent detection system based on Graph Neural Network(GNN)and Bi-directional Long Short Term Memory(Bi LSTM)is proposed,and verifies the performance of the model by building a Chinese accent corpus.Firstly,in feature selection,on the basis of the original acoustic and syntax features,grammar rules are used to obtain the dependency relationship between words in a sentence.Considering that the dependency relationship belongs to non-Euclidean structural data,GNN is introduced to convert the dependency relationship into an adjacency matrix,and a Dependency Tree-Graph Convolution Neural Network(DT-GCN)model based on the dependency relationship is established.Adding Bi LSTM at the same time allows the model to better integrate contextual relationships and achieve long-term learning and memory.Secondly,considering that another reason for the difficulty of chinese speech accent detection is the lack of a corpus for chinese accent labeling,this paper based on the focus theory and grammar accent rules to automatically label the accent in the general chinese speech corpus to obtain the P-biaobei corpus.At the same time,in order to enrich the chinese speech accent corpus with different reading styles,this paper establishes a Chinese emotional accents corpus P-Camel Xiangzi corpus based on the reading styles of literary and novel,which uses artificial tags to mark accent and phrase boundaries.The results show that the Bi LSTM+DT-GCN model,which combines dependency relation,achieves 81.92% for Chinese accent detection on the basis of P-biaobei corpus.Finally,considering the relevance between speech accent and phrase boundary in prosodic events,a multi-task learning(MTL)model for accent recognition and phrase boundary recognition is established,which is trained and tested on the P-Camel Xiangzi corpus,where both speech accent and phrase boundary are labeled manually.The experimental results show that the multi-task Bi LSTM+DT-GCN model can improve the performance of chinese speech accent detection.The F1 of the model reaches 85.68%,which is 1.28% higher than the single-task Bi LSTM+DT-GCN model.
Keywords/Search Tags:speech accent detection, grammatical structure, dependency, graph neural network, multi-task learning
PDF Full Text Request
Related items