Font Size: a A A

Research On Active Learning And Semi-supervised Learning Methods In Speech Emotion Recognition

Posted on:2014-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:X P ZhangFull Text:PDF
GTID:2268330422951687Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Speech emotion recognition, especially large-scale emotion recognitionrequires a lot of emotional speech annotation. Numerous unlabeled naturalspeeches can be easily collected, while the annotation is high cost, especially forthe dimensional labeling. This paper studied active learning and semi-supervisedlearning in both discrete emotion model and dimensional emotion model. Activelearning focuses on how to choose the most informative data for humanannotation, while the semi-supervised learning focuses on how to label emotionalspeech automatically by itself. The goal is to achieve high performance at lowannotation cost.Aiming at how to choose the most informative emotional speech for humanannotation, thus effectively reducing the annotation costs, we studied activelearning method in speech emotion recognition. Under discrete emotion model,considering that the output probability is a measure of classification certainty, westudied least confidence, margin sampling and information entropy methodsbased on output probability. Moreover, we studied query by committee methodby constructing a committee of models to represent different regions of versionspace. Vote entropy and K-L divergence were applied to measure thedisagreement. Larger disagreement indicated more information. Underdimensional emotion model, output probabilities cannot be directly estimated inregression problems. In this paper, we studied least confidence method thatdiscretized the continuous annotation, and then approximately evaluated theinformativeness using a discrete classification model. Similarly, we also studiedquery by committee method by constructing a committee of regression model andestimated speech informativeness using output variance.Aiming at how to combine unlabeled data to improve learning performance,we studied semi-supervised learning method in speech emotion recognition.Under the discrete emotion model, considering that the high output probabilityspeech is the most certainty, we explored the self-learning method based onconfidence thresholds and the co-training method using different feature subset.Moreover, we used graph Laplacian to represent the similarity between two emotion data and propagate the label to surrounding vertex based on theirsimilarity, namely label propagation algorithm. Under dimensional emotionmodel, we explored the graph regularized method in speech emotion recognitionbased on manifold assumption. Considering squared loss of labeled data andmodel complexity norm in Hilbert space, we studied the LapRLS which usedLaplacian matrix defined on the manifold structure as a regularization term, andthe CoRLS which used two prediction functions’ disagreement based on twodifferent views as a regularization term.Finally, we studied the combination of active learning and semi-supervisedlearning methods under the discrete emotion model. The combination methodaugmented the training set by selecting least confidence speech for humanannotation and highest confidence speech annotated by the model.
Keywords/Search Tags:Speech Emotion Recognition, Active Learning, Semi-supervisedlearning, dimensional emotion, graph regularized methods
PDF Full Text Request
Related items