Research On The Recognition Of Multidimensional Speech Information

Posted on:2018-11-26

Degree:Master

Type:Thesis

Country:China

Candidate:S Li

Full Text:PDF

GTID:2348330536479565

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

With the increasing demand for artificial intelligence and rapid development of machine learning techniques,the technology of speech interactive has already become the development trend of practical application fields such as the next generation of smart home.An increasing number of researchers are keen on the study of speech signal recognition,including speech recognition,speaker recognition and emotion recognition.Until now,the study of traditional speech recognition mainly focuses on single target class identification.In other words,every target aspect of speech information is studied separately.But given the fact that speech signal collected by people in reality is a mixed one containing three types of information: semantic information,speaker-ralated information(gender,age,emotion state,etc.)and background information mixed in speech.Moreover,the above all kinds of sound information can be recognized simultaneously in human dialogues.And single target recognition will have problem in understanding the meaning of human voice completely and reduce robustness of speech recognition,therefore hinder the development of voice interaction system.If the computer can understand the comprehensive information of speech signal like human beings,researchers will come up with a new world of speaker information evaluation and greatly improve the efficiency of human-machine dialogue.Our team is starting to explore the technology of multidimensional speech information recognition,which could solve the bottleneck problems existing in single target recognition systems.Owing to the diversity of human speech,this thesis firstly studies multidimensional speaker-related information recognition as a pioneering attempt.In detail,the speaker information in this paper includes gender,emotion,and identity.And the main research work and innovation of this thesis are as follows:Based on the research of the technology of existing gender-dependent emotion recognition and recognizing gender and identity under the emotional environment respectively,this thesis analyzes the commonness and characteristics of traditional recognition system according to the primary process of automatic speech-related identity recognition.In order to implement a whole multidimensional speaker information recognition system,there are two key modules of feature extraction and classifiers.(1)Different feature parameters can represent different speech information,and the same feature vectors can also be used to recognize different single speech target.Prosodic features,quality features and spectral features,which widely used to research three aspects of information recognition related to the speakers,contain abundant speakers' information.Thus it is reasonable to use the combination feature vectors as characteristic parameters of the multidimensional speaker information identification in this paper.Two different methods of obtaining the fusion feature parametres are introduced to acquire the low-level feature and high-level feature.(2)At first,this thesis builds a baseline system using support vector machine to detect the correctness of multiple message identification creatively,which is also used as a reference owing to the lack of mature literatures and theories.Then,performance of single system is compared with baseline system.It is found that the accuracy of baseline system is about 11.37% higher than that in single target class identification systems.At last,it can prove the feasibility and validity of the baseline system solution,which is also a new method of multidimensional speaker information recognition.(3)In essence,multidimensional speaker information classification task is a multi label learning problems.Coincidentally,MIML is a framework relying on label ambiguity,and used to realize many-to-many mapping.A novel classification system that uses improved MIMLSVM algorithm is presented to support the research.The MIMLSVM,which was never used in speech processing before,reveals the natural ability of mining much more potentially useful information.This algorithm is proposed to take advantage of the relationship between the labels and enhance the recognition performance by adopting double judgment based on gender.Experimental results show that improved MIMLSVM system performs surprisingly well than baseline system except gender recognition,with either low-level feature or high-level feature employed.In addition,this system can significantly improve the accuracy of multiple recognition by 1.97% compared to baseline system,which is superior to other single classified system in some respects.However,the larger the number of labels,the higher the computational complexity.Therefore,we should make a balance between accuracy and operation time.

Keywords/Search Tags:

multidimensional speaker information recognition, emotion recognition, gender recognition, baseline system, improved MIMLSVM algorithm, multiple classification

PDF Full Text Request

Related items

1	Emotion-based Features In Gender Recognition Of Chinese Micro-blog
2	Study On Application Of Spectral Map In Speaker Gender And Age Recognition
3	Research On Three-dimensional Features Recognition Based On Deep Learning Speaker
4	Research On Machine Learning Based Speaker Recognition
5	Emotion Recognition Based On Multi-modal Information Fusion
6	Face Gender Recognition Technology Based On Digital Image Processing
7	Studies On Speaker Recognition Based On SVM And GMM
8	Research On Key Techniques Of Speech Emotion Recognition
9	Research On Speaker Ggender Recognition And Age Estimation
10	Speaker Recognition Research Based On Improved Mel Feature Extraction Algorithm