Research On Music Source Separation Algorithm Based On Deep Convolutional Neural Network And Its Application

Posted on:2022-05-13

Degree:Master

Type:Thesis

Country:China

Candidate:F Mai

Full Text:PDF

GTID:2518306524990459

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

The purpose of music source separation is to separate the music audio into different constituent sources,such as singing voice and accompaniments.For single-channel mixed music,the performance of music source separation system has reached a certain bottleneck.How to improve the system performance and promote the application and popularization of large-scale music source separation technology remains a challenge.Therefore,this thesis studies the music source separation model based on deep convolutional neural network,and implements a software system for users to separate music and extract human voice and accompaniment.The contributions are as follows:(1)In this thesis,aiming at the single-stage encoder-decoder network model,a twostage gated codec network architecture is proposed.The first stage is the masking separation structure based on codec convolutional neural network,and the second stage is the mapping enhancement structure of small one-stage model.On this basis,the twostage joint approach sets a door for each evaluation source,adds a control switch,and activates the switch through a checksum calculated by a certain negative feedback loop.Experimental results show that the proposed model improves the distortion rate of human sound source and accompaniment source by 0.44 d B and 0.09 d B,respectively,compared with the SHN model.(2)Aiming at the encoder-decoder network model,an embeddable fusion and compensation cascade architecture is proposed.The fusion module is embedded in the encoder stage to improve the capability of feature extraction and the characterization ability of Bottleneck layer.The compensation module is embedded in the decoder stage to recover part of the information lost by the encoder.The experimental results show that,compared with the SHN model,the fusion compensation cascade structure can increase the distortion rate of human sound source and accompaniment source by 0.40 d B and 0.07 d B,respectively.(3)Attention mechanism and encoder-decoder networks were used to construct a music source separation model.The attention mechanism is nested on the two-stage architecture.On the one hand,the self-attention mechanism is used to coordinate the control gate of the second stage,which makes the training of the first and second stages more intelligent.On the other hand,the self-attention mechanism is nested on the jump link of the codec to solve the problem of too many repeats in low resolution and selectively cascade to the decoder.The experimental results show that compared with the SHN model,the embedded attention mechanism in the two-stage model can effectively increase the distortion rate of human sound source and accompaniment source by 0.53 d B and 0.20 d B,respectively.

Keywords/Search Tags:

music source separation, convolutional neural network, encoder-decoder, attention mechanism

PDF Full Text Request

Related items

1	Design Of Mathematical Formula Recognition System Based On Convolutional Neural Network And Attention Mechanism
2	Visual Data Understanding Based On Deep Encoder-Decoder Framework
3	Research On Music Source Feature Extraction And Separation Algorithm Based On Deep Neural Network
4	Research On Scene-based Image Semantic Description Generation Technology
5	Research On Image Semantic Segmentation Based On Convolutional Neural Network
6	Research And Application Of Self-attention Mechanism In Semantic And Sentiment Analysis
7	Multi-level Semantic Information Adaptation For Semantic Image Segmentation
8	Research On Encoder-Decoder Model For Complex Structure Text Recognition
9	Research On Salient Object Detection For Images And Videos
10	Research And Application Of Image Semantic Segmentation Based On Encoder-decoder