Font Size: a A A

Speech Separation Using Deep Learning And Statistical Signal Processing Techniques

Posted on:2022-01-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Asim MasoodFull Text:PDF
GTID:1488306323462904Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In real world,we constantly receive diversified and blended speech signals around us.The information that a person perceives is degraded.In certain ways,collecting reliable data from environment is challenging.For instance,with interference from spatial reverberation,there are sounds and vibrations all around us.There are several sounds involved at the same time.The soundtracks and hearing condition of voice and sound sources are greatly damaged.As a consequence,devising a viable solution for isolating a mixed speeches and noises into its different speech sources is a tough challenge nowadays.This particular problem has attracted many researchers in this field.These researchers are working on technical devices and applications to address this issue.For past 40 years,mixed sources are decomposed into its individual sources by using different blind speech separation(BSS)techniques.Blind speech separation(BSS)isolates number of different sound sources from the mixed sources with or without very little know-how about the source signals or the mixing procedure.Reconstructing sound from a mixed sources or a series of mixed sources is a challenging task.Signal restoration from a merged sources or series of merged sources is also addressed by BSS.Deep learning and artificial intelligence are now at all-time high and they are having a significant effect on technological development and business growth.Deep learning is currently transforming the globe.Many effective innovations and technologies have been created.In this evolving technology,many researchers are working on deep learning based source separation.To complete various challenging tasks,a wide range of signal processing techniques for deep learning methods have been introduced.In this research work,BSS and deep learning are used for speech separation.BSS and deep learning are two professional areas in mixed signal separation that have been thoroughly studied in this thesis to solve a number of problems.Our research is focused on Multichannel speech separation,monaural speech separation,and de-convolution speech separation as they are the three main types of mixing systems or sensor networks used in practical applications.In the first research work,Independent Component Analysis(ICA)algorithm is optimized using geometric Kth nearest neighbour entropy estimator(GKNN).This algorithm has outperformed all well-known ICA algorithms specially when recording is long and speech is present for small portion of recording.The algorithm presented in this research work is robust and it is also computationally efficient.Speech separation results are also very good.In the second research work,Independent Vector Analysis(IVA)algorithm is optimized using geometric Kth nearest neighbour entropy estimator(GKNN)and entropy estimation using recursive copula splitting.These entropy estimators measure entropy using global search estimator over the dataset which leads to optimized convergence and better classification of convolutive speech mixture.These entropy estimators improve overall performance of IVA.In the third research work,Deep Recurrent Neural Network(DRNN)architecture is optimized by using novel activation functions.The failing Rectified Linear Unit(RELU)issue has been solved by using these activation functions.Thus we are able to obtain better separation performance.Computational cost is also lowered significantly.In the fourth research work,single channel source separation is optimized by using two stages of separation.A Deep Recurrent Neural Network(DRNN)is used in the first stage to separate the sources and then Copula Component Analysis(CCA)is used in the second stage to further enhance the separation of speech sources.In the last research work,mode failure problem in Generative Adversarial Network(GAN)is fixed using geometric Kth nearest neighbour mutual information estimator(GKNNMI).This architecture combines conditional GAN with two traditional neural networks.The results depicts that this architecture boosts singing voice separation efficiency.
Keywords/Search Tags:Speech Separation, Independent Component Analysis, Independent Vector Analysis, Deep Recurrent Neural Network, Generative Adversarial Network, Geometric Kth Nearest Neighbour Entropy Estimator, Recursive Copula Splitting, Activation Function
PDF Full Text Request
Related items