Research On Stereoscopic Reconstruction And Transcription Of Monophonic Music

Posted on:2022-12-02

Degree:Master

Type:Thesis

Country:China

Candidate:W Zhang

Full Text:PDF

GTID:2505306764976259

Subject:Telecom Technology

Abstract/Summary:

PDF Full Text Request

Music is an art with a long history and a form of media that people often come into contact with in daily life.In recent years,with the great development of deep learning technology,new research directions and challenges have emerged in the field of traditional signal processing.This thesis focuses on the field of music signal processing,and the research goal is to apply deep learning technology to innovate the music signal processing tasks,including the stereoscopic reconstruction task of monophonic music based on visual information and the automatic transcription task of music.The key to the stereoscopic reconstruction task of monophonic music based on visual information is how to effectively extract and fuse the spatial information into the audio signal,so as to give the monophonic signal a stereoscopic effect.Survey of related research shows that the audio and video feature fusion algorithms proposed by the existing researches are relatively simple,which will introduce too much noise into the audio signal,resulting in limited stereo quality.To solve this problem,this thesis designed an audio-visual feature fusion algorithm based on the self-attention mechanism,which realizes the retention of important visual features and the filtering of irrelevant information.The algorithm introduces as little noise as possible when injecting important spatial information into the audio signal,ensuring a high-quality stereo signal.In addition,inspired by the related research on audio source separation,an iterative network structure is designed to further optimize the generated stereo quality.The comparison with the experimental results of previous studies shows that the algorithm in this thesis can achieve the state-of-the-art performance.For automatic music transcription,previous work generally processed stereo signal into monophonic signal,which failed to make full use of stereo information and limited the accuracy of transcription models.In order to transcribe stereo music,a stereo feature enhancement module is designed to fully extract the correlation and difference information between the two stereo channels,which improves the performance and robustness of the model transcription.In addition,a temporal convolutional module is designed to model the time structure of music,while ensuring the running performance and transcription effect of the model.Experiments on related dataset prove that the algorithms in this thesis have ideal effects.Finally,based on the summary and thinking of this work,suggestions for future research and innovation directions are put forward.

Keywords/Search Tags:

Audio Signal Processing, Deep Neural Network, Audio-visual Learning, Automatic Music Transcription

PDF Full Text Request

Related items

1	Research Of Audio-visual Fusion Piano Transcription Technology And System Realization
2	Research And Implementation Of Audio-visual Fusion Piano Transcription Technology
3	Research On Music Audio Classification Based On Deep Learning
4	Research On Automatic Transcription Algorithm Of Piano Music Based On CNN-HMM
5	Music Genre Recognition Research Based On Improved Deep Convolutional Neural Network
6	Research On Automatic Singing Transcription System:from Singing Signal To MIDI Files
7	Humming Audio Spectrum Recognition Based On Deep Learning
8	Research On Music Denoising And Automatic Transcription Based On Deep Learning
9	Research And Implementation Of A CNN-based Piano Music Transcription Algorithm
10	Research And Implementation Of A Vision-based Piano Transcription System