Font Size: a A A

Research On Multi-dimensional Speaker Recognition Based On Neural Network

Posted on:2020-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:H X ChenFull Text:PDF
GTID:2428330590995605Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As the key technology of human-computer interaction communication,speech recognition has been widely used in various fields.The voice in real life is a complex signal containing a variety of information of the speaker.However,most of the current speech recognition research mainly focuses on identifying a single message,and has not been able to identify the speaker identity,content,age,gender,emotion,etc.at the same time.Multi-dimensional information recognition systems,such as common speech content and identification systems,basically ignore the understanding and expression ability of human emotions.Compared with humans,such recognition systems are bound to be difficult to achieve true intelligence requirements.Therefore,in order to realize the intelligentization and anthropomorphization of speech recognition,the team first proposed the topic of simultaneous recognition of multi-dimensional information of speakers,aiming to fully exploit the deep relevant information between multi-dimensional information.This thesis selects the gender,emotion and identity information of common speakers in speech,and establishes a multi-dimensional speaker information recognition model.By finding the correlation between different voice information,the simultaneous recognition of multi-dimensional voice information is realized.The main work and innovations are as follows:(1)This thesis proposes a method to identify multi-dimensional speech information by using Multi-Task Learning(MTL)mechanism.We choose identity vector(i-vector)to represent the feature of the sentence as a feature parameter for multi-dimensional speaker information recognition.Deep Belief Networks(DBN)is a deep learning structure with powerful self-learning capabilities.First,the DBN model is used to build a separate Single-Task Learning(STL)system for each recognition task.In order to make full use of the correlation information between different voice information into multi-dimensional recognition research,this thesis combines MTL technology and DBN to construct a multi-dimensional speaker information recognition system based on MTL,which realizes the recognition of speaker emotion,gender and identity information.By using MTL technology,one of the speaker information is identified as the main task in turn,and the other two are used as auxiliary tasks,and the DBN neural network is used to share the information learned by all the tasks to improve the system identification performance.Experiments show that the MTL multi-dimensional system recognition rate is increased by 4.82% compared with the single-dimensional STL system,and the model complexity is also reduced.(2)The MTL multi-dimensional system does not specify the correlation between different voice information.In response to this problem,this thesis combines three types of identification models: gender,emotion and identity to construct a multi-dimensional identification baseline system based on DBN.The DBN baseline system makes full use of gender information,but ignores the association between emotion and identity.According to the shortage of DBN baseline system,a multi-dimensional recognition system based on Progressive Neural Network(ProgNets)is constructed.The purpose of ProgNets technology is to migrate the knowledge of the auxiliary speech recognition model to the main speech recognition model to enhance recognition performance.Based on gender classification,the ProgNets system uses ProgNets technology to migrate information between emotion recognition and identity.Compared to the DBN baseline system,the ProgNets system makes full use of the correlation between speaker identity and emotion.Experiments show that the DBN baseline system recognition result is 4.73% higher than the single-dimensional recognition model STL.The recognition performance of the ProgNets multi-dimensional recognition system is 1.8% higher than that of the baseline system,and it is also superior to the MTL multi-dimensional identification system.
Keywords/Search Tags:Multi-dimensional speaker information recognition, Multi-task Learning, i-vector, Deep Belief Networks, Progressive Neural Network
PDF Full Text Request
Related items