Font Size: a A A

Research On Music Source Separation Algorithm Based On Deep Convolutional Neural Network And Its Application

Posted on:2022-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:F MaiFull Text:PDF
GTID:2518306524990459Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The purpose of music source separation is to separate the music audio into different constituent sources,such as singing voice and accompaniments.For single-channel mixed music,the performance of music source separation system has reached a certain bottleneck.How to improve the system performance and promote the application and popularization of large-scale music source separation technology remains a challenge.Therefore,this thesis studies the music source separation model based on deep convolutional neural network,and implements a software system for users to separate music and extract human voice and accompaniment.The contributions are as follows:(1)In this thesis,aiming at the single-stage encoder-decoder network model,a twostage gated codec network architecture is proposed.The first stage is the masking separation structure based on codec convolutional neural network,and the second stage is the mapping enhancement structure of small one-stage model.On this basis,the twostage joint approach sets a door for each evaluation source,adds a control switch,and activates the switch through a checksum calculated by a certain negative feedback loop.Experimental results show that the proposed model improves the distortion rate of human sound source and accompaniment source by 0.44 d B and 0.09 d B,respectively,compared with the SHN model.(2)Aiming at the encoder-decoder network model,an embeddable fusion and compensation cascade architecture is proposed.The fusion module is embedded in the encoder stage to improve the capability of feature extraction and the characterization ability of Bottleneck layer.The compensation module is embedded in the decoder stage to recover part of the information lost by the encoder.The experimental results show that,compared with the SHN model,the fusion compensation cascade structure can increase the distortion rate of human sound source and accompaniment source by 0.40 d B and 0.07 d B,respectively.(3)Attention mechanism and encoder-decoder networks were used to construct a music source separation model.The attention mechanism is nested on the two-stage architecture.On the one hand,the self-attention mechanism is used to coordinate the control gate of the second stage,which makes the training of the first and second stages more intelligent.On the other hand,the self-attention mechanism is nested on the jump link of the codec to solve the problem of too many repeats in low resolution and selectively cascade to the decoder.The experimental results show that compared with the SHN model,the embedded attention mechanism in the two-stage model can effectively increase the distortion rate of human sound source and accompaniment source by 0.53 d B and 0.20 d B,respectively.
Keywords/Search Tags:music source separation, convolutional neural network, encoder-decoder, attention mechanism
PDF Full Text Request
Related items