Font Size: a A A

Research On Speech Signal Processing Based On Deep Learning

Posted on:2020-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z MuFull Text:PDF
GTID:2518306305497434Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Language is the most important communication tool for human beings.It is convenient,efficient and accurate.At the same time,speech as an acoustic representation of language plays an important role in daily life.With the continuous development of computer and Internet of Things,more and more machines are involved in people's life.How to carry out human-computer interaction efficiently has become a research hotspot in academia and industry.Especially with the development of deep learning and artificial intelligence,it has become a trend to use deep neural network's powerful representation ability and generalization performance to realize speech signal processing.Moreover under the impetus of AIOT,speech signal processing has shown great practicality in the fields of smart home,driverless and speech control.Therefore it have great significance to explore and research speech signals.In the thesis,a series of researches are carried out on speech recognition and speaker verification,which mainly includes the preprocessing methods such as speech signal analysis and feature extraction,and the construction of deep neural network to achieve speech recognition and speaker verification,what' s more analyze the methods for improving the system's performance.The thesis focuses on the research of speech recognition and speaker verification on the following aspects:1.Study the time frequency analysis method of speech signal and the feature extraction method.Speech signal is a non-stationary time-varying signal that carries a large amount of information,including text-related acoustic features,speaker-related identity features,timbre and pronunciation features etc.Researching and analyzing these features have a great importance on its subsequent implementation of the speech recognition(speaker verification)system.The thesis mainly analyzes the time-frequency features of speech signal,and analyzes the applicable range according to the existing feature extraction methods,and then designs the feature extraction methods for specific needs.2.Study and analyze the models and methods of speech recognition.Firstly,the existing speech recognition models are analyzed,and then a sequence-set-sequence speech recognition model is constructed according to speech recognition tasks.And then,the training methods to improve the accuracy of model recognition are proposed for the data with unbalanced samples.Compared with the image data,speech enhancement is more difficult for speech signals.Without the acculmulation of a large number of speech models,random truncation and combination of speech signals cannot be achieved.So the thesis proposes two solutions for the imbalance samples.Firstly,in the data preprocessing stage,the method of increasing noise,speech signal translation and speech signal gain perturbation is used to increase the diversity of speech data.Then,in the training stage of the model,a model training method based on curriculum learning is proposed.The methods can effectively improve the feature extraction and learning ability of the model for samples with less categories.3.Study and analyze the models and methods related to speaker verification.By analyzing the acoustic characteristics of the speaker and the effect of the classifier,the thesis proposes and constructs a neural network for speaker recognition.Through experimental analysis,the model constructed in this thesis can achieve the discriminative training of the speaker and improve the accuracy of speaker recognition.
Keywords/Search Tags:speech signal processing, speech recognition, speaker verification, curriculum learning
PDF Full Text Request
Related items