Font Size: a A A

Semi-supervised Speech Emotion Feature Learning Method Based On The Distribution Consistency Of Data Subspace Representation

Posted on:2022-09-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:H LuoFull Text:PDF
GTID:1488306569983399Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Speech emotion recognition has a wide range of applications in the field of humancomputer interaction,such as call center,mobile services and so on.In practical application,the complexity of speech emotion,which not only makes it necessary to use high-dimensional features to characterize each state,but also makes data annotation costly and time-consuming.Therefore,learning the discriminative low-dimensional feature representation for the speech emotion has become one of the keys focuses in the research of speech emotion recognition.As a mainstream method,semi-supervised speech emotion feature learning can be divided into two categories.One is for the data from a single source,which uses a large amount of unlabeled data and a small amount of labeled data that are consistent with them in distribution for the semi-supervised learning.The other is for the data from multiple sources,which uses some unlabeled data and labeled data that are inconsistent with them in distribution for the semi-supervised learning.However,these methods lack the in-depth research on the distribution consistency learning of feature representation,which leads to the lack of discriminability of the learned features.Based on the above analysis,this paper focuses on whether the data comes from a single source and distribution or multiple ones,and studies the semi-supervised speech emotion feature learning methods.We reduce the influence of irrelevant factors on the distribution consistency of speech emotion features by means of the subspace learning,and improve the discriminability of feature representations.The main research contents and contributions are summarized as follows:(1)For the case of a single data source with the consistent distribution,we attempt to find the speech emotion features that are most relevant to the label information by the supervised projecting the labeled data to the sparse subspace from the original feature space.Meanwhile,a measurement method that can be flexibly adjusted according to different data distributions and be robust to the noise and outlier is adopted to discover the real intrinsic structure of data,and learn the speech emotion feature representation that can preserve the structure and distribution consistency of the data.Then we propose a feature learning method based on the distribution consistency of sparse subspace representation of single source data.Furthermore,an improved optimization algorithm is proposed to suppress the oscillation behavior of the traditional algorithm in the iterative process.The experimental results show that the proposed method can effectively improve the performance of speech emotion recognition system on the single source data.(2)For the case of multiple data sources with different distributions but consistent label space,we attempt to learn a potential common low-rank subspace for multi-source data by means of semi-supervised non-negative matrix factorization,and integrate the label information into the corresponding subspace representation.At the same time,the maximum mean difference criterion and local structure preserving regularization are used to constrain the marginal distribution consistency of the common subspace representation of multi-source data.In addition,to further eliminate the distribution differences between multi-source data,a self-learning based conditional distribution estimation method is proposed,and the maximum mean difference criterion is used to constrain the conditional distribution consistency of the common subspace representation.Then we propose a feature learning method based on distribution consistency of common subspace representation of multi-source data.The experimental results show that the proposed method can improve the performance of speech emotion recognition by using multi-source data.(3)On the basis of the feature learning method based on distribution consistency of common subspace representation of multi-source data,for the problem without using the advantage of the interaction between the label prediction and the distribution consistency learning,we attempt to integrate them into a joint learning model by means of the label propagation,which can better eliminate joint distribution differences between multiple sources of data.To better learn the feature representation with emotional discriminability,the semi-supervised non-negative matrix factorization method with orthogonal constraint is used to remove the individual components from the common subspace of multi-source data,which can lead to their shared subspace and integrate the discriminative information of labeled data into this subspace.Then we propose a feature learning method based on distribution consistency of shared subspace representing of multi-source data.The experimental results show that the proposed method can further improve the performance of speech emotion recognition on the multi-source data.(4)For the case of multiple data sources with different distributions and inconsistent label spaces,we consider to learn their subspace with joint distribution consistency from two aspects.First of all,for the problem of identifying the known and unknown classes in the unlabeled speech emotion data,we attempt to analyze the generalization error of classification function on the unlabeled data in the common subspace obtained by the semi-supervised non-negative matrix factorization,and use it to learn an open set discriminative subspace that can separate the data of the known classes and the unknown ones.Meanwhile,the maximum mean difference criterion and local structure preserving regularization are used to constrain the joint distribution consistency of the subspace representation of the known multi-source data.Then we propose a feature learning method based on distribution consistency of open set discriminative subspace representation of multi-source data,and the learned speech emotion features not only have the ability to distinguish the known and unknown classes,but also can transfer the discriminant information to unlabeled data from the labeled one.The experimental results show that the proposed method can improve the performance of speech emotion recognition by using open set multi-source data.
Keywords/Search Tags:Speech emotion recognition, feature learning, distribution consistency, semi-supervised learning
PDF Full Text Request
Related items