Font Size: a A A

Research On Robust Binaural Speech Separation Algorithm

Posted on:2018-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y XuFull Text:PDF
GTID:2348330542952056Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The research of speech separation is of great significance in the fields of speech signal processing such as speech communication and speech enhancement.As a front-end module,the performance of speech separation directly determines the merits of the entire speech system.Based on the human binaural hearing mechanism,binaural speech separation has a more robust characteristic than monaural speech separation.In this paper,we discuss the problem of robust binaural speech separation.Based on spatial cues and the characteristic of time-frequency analysis,a binaural speech separation algorithm of multi-source mixed speech is proposed.The algorithm proposed in the thesis contains two aspects:smoothing technology based on DUET(Degenerate Unmixing Estimation Technique)algorithm,sub-band separation algorithm based on CASA(Computational Auditory Scene Analysis).(1)Smoothing technology based on DUET algorithm.The traditional DUET separation algorithm is based on the sparsity of the speech signal in frequency domain and to generate a binary mask for each source separated speech.However,this hard-coded mask will cause the problem of losing some frequency points.This thesis proposes to construct a probabilistic form of mask,and uses two kinds of ideas to implement soft coding.One is sub-band smoothing based on the gammatone filter group,which applies the existing binary mask and the sub-band spectral function to calculate the proportion coefficient of each channel,and then derives the soft-coded values for each time-frequency point.Another is smoothing based on the sigmoid function.The sigmoid function has excellent fitting ability for the signal with a inverted-bell-shaped signal with a distributed probability density function,and then the matching distances of the candidate azimuths are converted into a mask of the soft-coded form of the separated speech.In this paper,the PESQ(Perceptual Evaluation of Speech Quality)value is used as the evaluation criterion.The simulation results show that the two smoothing techniques can achieve a robust improvement in a variety of environments.(2)Sub-band separation algorithm based on computer auditory scene analysis.The mixed speech is sub-banded and framed to obtain the time-frequency unit,that is,the T-F unit.The problem of binaural speech separation evolves into the attribution of T-F units.In this paper,we propose two algorithms to get the mask matrix of T-F units.One is the generative model based on KDE(Kernel Density Estimation).In the training phase,the probability density function library of the feature distribution of each channel with different azimuths is calculated by using the kernel density estimation function.In the test phase,the attribution of the T-F units is determined by comparing the probability density of the feature vector of the T-F units of the mixed speech at different azimuths.And another is the discriminative model based on SVM(Support Vector Machine).We consider the problem of speech separation as a multi-classification problem.In the feature space,we train an SVM multi-classifier to calculate the attribution of the feature vector corresponding to the T-F unit of mixed speech.The binaural features used in both algorithms are ITD(Interaural Time Difference)and IID(Interaural Intensity Difference).In this session,HIT-FA(HIT rate minus False-Alarm rate)and SDR(Source to Distortion Ratio),S AR(Sources to Artifacts Ratio)and SIR(Source to Interferences Ratio)are used as evaluation criterions.The experimental results show that both algorithms based on sub-band have significantly improved compared to the existing algorithms in the laboratory.
Keywords/Search Tags:Smoothing, Kernel Density Estimation, Support Vector Machine, Speech Separation
PDF Full Text Request
Related items