| When listening to music,humans have an extraordinary ability to isolate individual sounds:whether for voices,violins,pianos or percussion,it is also easy to understand the lyrics of a song,to identify timbre in the presence of many instruments,to discriminate individual instruments,and even to track the pitch of particular instruments.In the era of mobile Internet,with the popularity of music,short videos,etc.,the separation of music source in karaoke,lyric recognition and accompaniment separation needs to be more and more urgent: sometimes need to separate the accompaniment part of the music,for accompaniment use;Sometimes it is necessary to isolate the vocals for lyric recognition.However,it is still a great challenge for the machine to realize the separation of music.Therefore,it is worth studying and has practical application value to separate vocals,harmonic and percussion quickly and accurately.This article mainly studies the separation methods of vocals,strings,and percussion in two major types of music.Aiming at the separation of the harmonic part(including the vocal part)and the percussion part,the Harmonic Percussive Sound Separation(HPSS)is used to separate the harmonic part(including the vocal part)and the percussion part by means of cepstrum filtering and post-processing.Aiming at the separation of vocal and harmonic mixed signals,the separation method combine blind source separation and cepstrum filtering in further study.Finally,a music separation method based on U-Net neural network was studied and the performance of the main methods was compared and verified.The main work is as follows:(1)Aiming at the separation of harmonic and percussion in music,a separation method of cepstrum filtering and post-processing is proposed based on the characteristics of long span in time domain and narrow distribution in frequency domain of haomonic,while short span in time domain and wide distribution in frequency domain of percussion.This method first performs short-time Fourier transform on single-channel music,then performs two-dimensional Fourier transform on the transformed amplitude spectrum,designs a filter in the cepstrum domain to preliminarily separate the harmonic and percussion,and then post-processes the remaining harmonic in the percussion,and finally transforms back to the time domain signal.The simulation experiment shows that using the Perceptual Evaluation of Audio Quality(PEAQ)algorithm to evaluate the separated music can effectively separate percussion and harmonic,and the Objective Different Grade(ODG)score is improved by 2.0,the Source to Interferences Ratio(SIR)can be improved by about 0.5~4d B,and the Sources to Artifacts Ratio(SAR)can be improved by more than 15 d B.(2)Aiming at the problem that the existing single music separation algorithm is difficult to separate accompaniment and song,a music separation method combining cepstrum filtering and blind source separation is proposed.Firstly,the percussion music is extracted by the cepstrum filter separation method based on the Harmonic Percussive Sound Separation,and then the independent component analysis(ICA)based on natural gradient is used to separate the harmonic and vocals.The experimental simulation is carried out on the public music data set,and the objective evaluation of audio quality method is used to evaluate the separated music.The ODG score of percussion music increased by 2.48,the ODG score of vocal music increased by 1.62,and the ODG score of harmonic music increased by 1.56.The results showed that the CF-BSS method significantly improved the separation effect compared to the method using only independent component analysis,indicating that the CF-BSS method significantly improved the audio quality of percussion,string music,and vocal accompaniment after separation.(3)An end-to-end music separation method for U-Net networks was studied for datadriven music separation methods.It uses an encoder to extract features from the convolutional layers and a decoder to increase the feature map size and channel number,continuously restoring the model’s resolution.The L1 loss function has been improved.The algorithm fully utilizes the time-domain and frequency-domain characteristics of the music signal and evaluates the separation results by metrics such as SDR,SIR,and SAR.The results show that the proposed algorithm outperforms the Spleeter algorithm in all indicators,producing better separation of music and vocals. |