Font Size: a A A

Research And Application Of Human Voice Separation Algorithm Based On Deep Neural Network

Posted on:2022-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:X J ChangFull Text:PDF
GTID:2518306524993919Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In recent years,there has been more and more research on blind source separation,and blind source separation has also been applied to all aspects of production and life.From the "cocktail party problem" proposed by Colin Cherry in 1953 to the current neural network-based speech separation algorithm model,blind source separation technology has gradually developed in a broader and deeper direction.In this thesis,the author will select music scenes and extract human voice signals from them.At present,there are human voice separation model algorithms including fully connected neural networks,convolutional autoencoders and recurrent neural networks.Based on the improvement of the human voice separation model of the convolutional autoencoder,this thesis proposes a human voice separation model based on a fully convolutional neural network,called WAVEUNET for short.First,WAVEUNET converts the time series of the mixed signal into a time-frequency map through Fourier transform,and then separates the time-frequency map through a separation model to separate the time-frequency mask of the human voice signal,and the time-frequency mask of the accompaniment signal get through the mixed audio subtracts the human voice time-frequency mask.Finally,the time-frequency mask multiplied the time-frequency diagram to obtain the separated human voice signal spectrum and accompaniment sound spectrum,and then the inverse Fourier transform is used to obtain the separated human voice and accompaniment sound.Among them,the separation model applies the characteristics of the UNET network,and adds a fusion layer between the encoder and decoder of the same depth to improve the data lost in the original model pooling process.Finally,this thesis verifies the separation performance of WAVEUNET through experiments.Under the influence of different training objectives,the ideal binary mask and the ideal floating mask,the separation performance of the ideal floating mask will be better than the ideal binary mask.Secondly,in the case of the same layer depth,WAVEUNET has a better separation effect.In the case of different layer depths,the prediction ability of the auto-encoder network decreases as the layer depth increases,and the prediction ability of WAVEUNET increases as the layer depth increases.The separation performance of WAVEUNET is compared with the existing human voice separation model algorithms,which shows a good separation level,and at the same time,it shows the advantages of simple model,fast speed,and few weights.
Keywords/Search Tags:Vocal Separation, Convolutional Auto-encoder, UNET, Neural Network
PDF Full Text Request
Related items