| Music is an art form recognized by the public.With the advancement of science and technology,digital music has become an essential carrier for music communication and storage.Automatic music transcription converts music waveform signals into digital markers and it is one of the important research of music information retrieval.But the existing algorithms have high requirements to input which is must be the pure music,so the algorithms’ practicality is poor.In order to solve the problems,this thesis selects the most representative instrument piano as the research object.Research on the separation and automatic transcription of piano music in complex environments.The main contents include:(1)The separation of piano music in complex environment.In practical applications,the piano music that will be transcribed is often mixed with more environmental audio,such as noise,human voice or other instrument music.In such a complex environment,the pure piano music is polluted or covered,which seriously reduces the accuracy of piano music transcription.So before transcription,the pure piano in complex environment will be separated.After comparison and analysis,the time-domain Conv-Tasnet network is selected for separation.Conv-Tasnet directly sends the mixed time-domain waveforms to the network for training,and the network directly outputs the separated pure waveforms.There is no need to extract the frequency domain characteristics of audio signal,The process do not need extract the frequency domain characteristics of piano music,which avoids the disadvantages of phase error estimation and long calculation time in STFT.Conv-Tasnet has a good separation effect on speech data sets,but it is not good when applied to piano music data sets.(2)Aiming at the characteristics of the piano music polyphonic signal,this thesis proposes a Multi Conv-Tpsnet network model,which is a piano source separation model.Design a multi-scale encoder to extract more piano audio features;In the separator,the depthwise separable convolution is used to replace the ordinary convolution to solve the parameter redundancy problem,and reduced the model size;The full convolution Gated Linear Units in the separator more effectively controls the information flow,and improve the separation ability.Separate the mixed sources with different Signal to Noise Ratio,and the results show that compared to the Conv-Tasnet model,Multi Conv-Tpsnet achieves better results in separation of piano music sources.(3)Automatic transcription of pure piano music using Convs-Bi GRU network Analyze two frequency domain feature extraction methods,Short-Time Fourier Transform and Constant Q Transform.Experiments prove that Constant Q Transform is more suitable as a frequency domain feature extraction method for piano music transcription;Use channel pruning technology to compress the Convs-Bi GRU network transcription model. |