Font Size: a A A

The Study Of Multi-Pitch Extraction Based On Deep Learning

Posted on:2020-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z HuangFull Text:PDF
GTID:2428330575956508Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
To extract the fundamental frequency in polyphonic music,we call it multi-pitch extraction.Polyphonic music refers to music that has multiple sound sources at the same time.If the polyphonic music source is a mixture of vocal music and other musical instrument music,extract melody of the vocal music,that is,the fundamental frequency contour curve of the vocal music,we call it melody extraction.If the source of polyphonic music is sent from different strings or buttons of the same instrument,we need to automatically convert such polyphonic music into the score.We call this automatic music transcription.In this paper,multi-pitch extraction is studied by two practical applications:the melody extraction and automatic music transcription.1.The main task of melody extraction is to extract the fundamental frequencies of human voice music in the multi-tone music mixed with human voice music and background music.In view of the complex harmonic characteristics of multi-tone music,using the idea of sub-harmonic superposition,the extraction of main melody is regarded as a multi-classification problem.Based on the deep neural network and the network structure of Sangeun Kum,the deep harmonic network DHNN is proposed.In this paper,by changing the structure information of the original spectrum,only the corresponding frequency points related to the candidate fundamental frequency and their harmonic correlation features are input into the network,giving the prior information of the DHNN harmonic structure,which is equivalent to adding the prior information of the harmonic characteristics on the basis of the deep neural network.In this paper,two kinds of deep harmonic networks,LSTM and Resnet,are implemented.With 80%of MIREX-1K data set for training,the remaining 20%for testing,a number of indicators obtained state-of-the-art results.At the same time,the training model is tested on the mirex04 data set,and the result of state-of-the-art is achieved on the key indicator overall accuracy.2.The main task of AMT is to transform the music played by instruments into corresponding notes automatically.AMT is divided into two parts:Multi-pitch Extraction and Note Tracking.Because there may be more than one note at the same time,this paper regards the multi-pitch extraction of AMT as a multi-label classification problem.There are at most 88 notes at each time,and each note establishes a two-class network separately.Because of the invariance of frequency shift,the harmonic network of adjacent notes shares network parameters.Similar to the main melody extraction,the introduction of deep harmonic network(DHNN)gives the hamonic characteristics of deep network to deal with the complex hannonic structure of multi-tone music.Considering the different meanings of the two dimensions of audio spectrum,DHNN is implemented by a special one-dimensional convolution highway network,rather than the more commonly used two-dimensional convolution.At the same time,attention mechanism is introduced to extract useful key frames,and competitive network mechanism is introduced to solve octave error.Experiments show that the F-measure value based on frame decision in MAPS database is 0.8134,which is 10%higher than that of Siddharth(state of the art).Referring to the contour algorithm,a heuristic Note Tracking contour extraction algorithm is proposed.The F-measure value is 0.6970.
Keywords/Search Tags:Melody Extraction, Automatic Music Transcription, DHNN, competition model, attention
PDF Full Text Request
Related items