Font Size: a A A

The End-to-end Approach For Piano Music Transcription

Posted on:2021-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:K DengFull Text:PDF
GTID:2518306308474394Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Music transcribing is to convert music audio signal into corresponding music score or mid sequence.The piano as the most mainstream instrument,so this paper mainly focuses on the most complex polyphonic piano music transcription.With the increase of standard data sets(such as maps),the effect of end-to-end piano music transcription in the main international evaluation exceeds the method based on signal modeling(such as establishing salence function,signal decomposition),so the work of this paper is also based on end-to-end modeling.The existing end-to-end model takes single frame signal as modeling unit,which is called short-term end-to-end model in this paper.The work of this paper mainly includes two parts:Based on the existing short-term end-to-end model,a more efficient framework is proposed;secondly,a long-term end-to-end framework with chord as modeling unit is proposed.The main innovations are as follows:1)The properties of music signal are studied,and a more efficient and reasonable short-term end-to-end modeling framework is proposed.In view of the lack of consideration of music signal characteristics in short-term end-to-end modeling,the importance of frequency-domain components of notes is analyzed firstly.In this paper,a two-class classifier is trained for each note.Through the statistical analysis of input features and notes,we find three important properties that play an important role in music transcription:harmonic characteristics,translation invariance and note co-occurrence.Combined with the above three characteristics,a more reasonable structure is proposed in the short-term end-to-end modeling based on convolution network(CNN)and recurrent neural network(RNN):because the harmonic structure of notes is discrete in the frequency domain,it is only a repetition in the time domain,and because the harmonic structure of notes has local translation invariance,the shared dialation one-dimensional convolution core is more suitable for sliding along the frequency domain feature extraction;considering that RNN is difficult to be built deeply,a residual network based on bilstm is proposed to model the time domain,which exceeds the best RNN model by 10%under the same parameters;by analyzing the correlation and octave error of notes,the octave loss function is proposed.Finally,compared with the best short-term end-to-end model,the model in this paper reaches fl-means:0.80 on the maps data set,which is 3 percentage points higher than the current best model results.2)This paper studies the long-term characteristics of music theory,and proposes a long-term end-to-end modeling framework with chord as the modeling unit.For short-term end-to-end modeling,which takes frame as modeling unit,it is difficult to effectively model music theory for a long time.A long-term end-to-end model with chord as modeling unit is proposed,which is mainly divided into two modules:chord boundary detection and chord recognition.In chord boundary detection,chord boundary is defined as the boundary if the adjacent two frames are inconsistent,and each frame between the chord boundaries is labeled to alleviate the imbalance of labels;based on the boundary detection,the chord sequence is obtained by down sampling the detection multi-pitch detection results of single frame,and finally the chord sequence is modeled to output the final chord.In the final decoding,the mid sequence is obtained by combining chord recognition result and boundary detection.Compared with the short-term end-to-end model,the F1 index can be increased by 1 percentage point in maps data set,and the final transfer result is more compact and continuous.Finally,this paper studies the short-term and long-term framework of the end-to-end music transcribing model,and achieves different degrees of improvement in the public dataset MAPS.
Keywords/Search Tags:music transcription, end-to-end octave error, multi label classification, residual network
PDF Full Text Request
Related items