Font Size: a A A

Method And Implementation Of Monophonic Double Speech Separation Based On Auditory Scene Analysis

Posted on:2022-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2518306752997549Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In actual scenarios,speech signals are often interfered by noise.These noises seriously affect the intelligibility of speech and limit the application range of speech technology.How to extract target speech from mixed speech is of great significance to the application of speech technology in complex scenarios.At present,speech separation based on auditory scene analysis(ASA)is one of the mainstream methods in this field.This method is based on the mechanism of human auditory perception and uses computer technology to simulate the process of auditory perception and separates the target voice from monophonic mixed speech.Based on the ASA theory,this thesis studies the monophonic speech separation method.The main research contents are as follows:(1)The method of accurately estimating the pitch period is studied.Aiming at the difficulty in estimating the pitch period of the cepstrum peak detection method in a noisy environment,a method of extracting the pitch period based on the pitch period trajectory is adopted.In this thesis,we use cepstrum data to draw a pitch period spectrum,mark the speech signals corresponding to the continuous pitch period track as coming from the same sound source,and eliminate false cepstrum peaks that deviate from the true track.At the same time,the pitch period track is accurately detected on the pitch period spectrogram to obtain the pitch period.Compared with the cepstrum peak detection method,this method can effectively improve the accuracy of pitch period estimation in noisy speech.(2)The separation and reconstruction methods of voiced sounds are studied.In order to effectively extract the harmonic structure characteristics of the voiced audio spectrum,this thesis uses the pitch period as a clue to improve the comb filtering method.According to the integer multiple relationship between the pitch frequency and the harmonics,it is detected whether there are harmonics at the integer multiple frequency points.If it exists,the harmonic is extracted;otherwise,the frequency component is discarded.Therefore,the frequency spectrum of each harmonic is obtained,and the voiced sound is reconstructed according to the extracted harmonic spectrum using inverse Fourier transform and splicing.Experiments show that this method can effectively extract the harmonic structure features of voiced signals.(3)The speech separation method after mixing single-person speech and environmental noise is studied,and the influence of different types of environmental noise on the pitch period trajectory under different signal-to-noise ratio conditions is analyzed.The signal-to-noise separation experiment proves that the method of estimating the pitch period based on the pitch period trajectory is robust.(4)The separation method of mixed speech when two people are talking at the same time is studied.In order to solve the problem of interfering speech(The Phenomenon of Crosstalk)in the separated speech.This thesis draws a graph of the harmonic amplitude of the speech to analyze the cause of the crosstalk phenomenon,and studies the abnormal amplitude smoothing and the automatic phase reconstruction method based on the Griffin-Lim algorithm.Experiments show that this method can obviously eliminate crossover and effectively improve the sound quality of separated speech.
Keywords/Search Tags:Computational auditory scene analysis, Speech separation, Pitch period trajectory, Mono channel, Separation of signal and noise, Double mixed speech, Crosstalk
PDF Full Text Request
Related items