Font Size: a A A

Music Separation Method Based On U-shaped Network And Audio Fingerprint

Posted on:2022-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:K L LiuFull Text:PDF
GTID:2518306569497494Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of computer technology,the effect of speech processing has been significantly improved.As one of the basic tasks of speech processing,speech separation has received widespread attention.Explore the music separation method combined with audio fingerprint retrieval.Specific work includes the improvement of the separation model based on U-shaped network architecture and the separation and re-optimization based on background music audio fingerprint retrieval.An audio fingerprint database is constructed to realize audio fingerprint retrieval.First,an end-to-end music and vocal separation model based on deep learning is proposed.The model is designed based on the architecture of the U-shaped network.It is divided into two parts: the encoder part and the decoder part.The two parts adopt the time-series convolution residual module.The first part of the module is a set of one-dimensional waveforms in the time domain.Convolution filters are used to capture different local information in the voice signal,corresponding to different periodic signal components;the second part of the module uses a residual structure to support feature extraction based on deep networks.The experimental results on the public corpus MUSDB18 data set show that the proposed method has some advantages over the traditional timespectrogram-based method and the commonly used waveform-based method in some aspects.In addition,by introducing binary masking as a post-processing module,the separation effect of specific music components is significantly improved.Based on the use of U-shaped network structure to achieve music separation,the use of background music audio fingerprint retrieval technology is proposed to further optimize the music separation effect.The audio fingerprint retrieval module first builds an audio fingerprint database,and uses the landmark fingerprint construction algorithm to convert audio features into fingerprints for the construction and matching of the background music fingerprint database at the re-separation stage.The specific process is as follows: use the background music separated by the U-shaped network to search in the pure background music library,and use the retrieved background music to separate the original audio;the reseparation stage of the retrieved audio clips and original audio Perform alignment operations on time and volume,and subtract the retrieved audio clips from the waveform of the mixed audio to obtain a purer voice.The experimental results show that the proposed music re-separation strategy based on audio fingerprint retrieval can effectively improve the voice separation effect.After the model with a large separation effect is introduced into the retrieval re-separation module,the separation effect gap is significantly reduced,which fully verifies the retrieval based on audio fingerprint the music re-separation strategy is very robust.The main contributions of this subject are mainly in the following two aspects:one is to propose a new music vocal separation model,which has achieved outstanding results in some aspects;the other is the innovative use of audio retrieval technology to further optimize music Separate tasks to achieve a better separation effect.
Keywords/Search Tags:speech separation, U-shaped network, sequential convolution module, audio fingerprint, landmark algorithm
PDF Full Text Request
Related items