Research And Application Of Human Voice Separation Algorithm Based On Deep Neural Network

Posted on:2022-09-14

Degree:Master

Type:Thesis

Country:China

Candidate:X J Chang

Full Text:PDF

GTID:2518306524993919

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,there has been more and more research on blind source separation,and blind source separation has also been applied to all aspects of production and life.From the "cocktail party problem" proposed by Colin Cherry in 1953 to the current neural network-based speech separation algorithm model,blind source separation technology has gradually developed in a broader and deeper direction.In this thesis,the author will select music scenes and extract human voice signals from them.At present,there are human voice separation model algorithms including fully connected neural networks,convolutional autoencoders and recurrent neural networks.Based on the improvement of the human voice separation model of the convolutional autoencoder,this thesis proposes a human voice separation model based on a fully convolutional neural network,called WAVEUNET for short.First,WAVEUNET converts the time series of the mixed signal into a time-frequency map through Fourier transform,and then separates the time-frequency map through a separation model to separate the time-frequency mask of the human voice signal,and the time-frequency mask of the accompaniment signal get through the mixed audio subtracts the human voice time-frequency mask.Finally,the time-frequency mask multiplied the time-frequency diagram to obtain the separated human voice signal spectrum and accompaniment sound spectrum,and then the inverse Fourier transform is used to obtain the separated human voice and accompaniment sound.Among them,the separation model applies the characteristics of the UNET network,and adds a fusion layer between the encoder and decoder of the same depth to improve the data lost in the original model pooling process.Finally,this thesis verifies the separation performance of WAVEUNET through experiments.Under the influence of different training objectives,the ideal binary mask and the ideal floating mask,the separation performance of the ideal floating mask will be better than the ideal binary mask.Secondly,in the case of the same layer depth,WAVEUNET has a better separation effect.In the case of different layer depths,the prediction ability of the auto-encoder network decreases as the layer depth increases,and the prediction ability of WAVEUNET increases as the layer depth increases.The separation performance of WAVEUNET is compared with the existing human voice separation model algorithms,which shows a good separation level,and at the same time,it shows the advantages of simple model,fast speed,and few weights.

Keywords/Search Tags:

Vocal Separation, Convolutional Auto-encoder, UNET, Neural Network

PDF Full Text Request

Related items

1	SAR Target Classification Based On Complex Full Convolutional Neural Network And Convolutional Auto Encoder
2	Research On Denoising Algorithm Of ECG Signal Based On Convolutional Auto-encoder Neural Network
3	Weakly Supervised Detection Of Machine Anomalous Sounds Based On Auto-Encoder
4	Deep Auto-encoder Framework For SAR Images Change Detection
5	Research On Image Fusion Method Based On Deep Neural Network
6	A Hybrid Depth Network Learning Model Based On Auto-encoders
7	Research On Music Source Separation Algorithm Based On Deep Convolutional Neural Network And Its Application
8	Face Alignment Algorithms Based On Ensemble Of Deep Networks
9	Structured Auto-encoder Based On Deep Clustering Algorithm Analysis
10	Optimization Of Convolutional Neural Networks Based On Unsupervised Learning And Multi-sampling