Research Of Audio-visual Fusion Piano Transcription Technology And System Realization

Posted on:2021-04-26

Degree:Master

Type:Thesis

Country:China

Candidate:X Gong

Full Text:PDF

GTID:2505306104986319

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the improvement of material living standards,people pay more and more attention to the needs of spiritual life,and more and more people choose music education.As one of the most mature directions in music education,piano education has attracted many students.Automatic Music Transcription(AMT)can symbolize the output of piano performance,detect the currently playing notes,and output the pitch,start time,and end time,which helps performers record their performances and improve their performance.This research studies and implements the automatic transcription system of the piano,inputs the audio or video of the piano performance,and detects the performance information of each note based on the image or sound,including pitch,start time,and end time.The main contents of this article include:(1)In response to the lack of audio-visual fusion dataset,we created Play Dataset.This research proposed the practice mode and performance mode of piano playing for the first time.The player characteristic,difficulty and lighting conditions,and features of video transcription and audio transcription were taken into consideration when we constructed this dataset.(2)We improved existing audio transcription system and video transcription system.The energy balance algorithm is proposed in the audio transcription system,which strengthens the weak starting features,F1 value in the first 30 s of MAPS ENSTDk Cl is 88.38%.This research innovatively proposed dual camera recording for video transcription,which solves the problem of low accuracy of key recognition which are perpendicular to the camera,and achieved 93.5% F1 value in Play Dataset evaluation;(3)Two audio-video fusion transcription systems were designed and implemented: a logical fusion transcription system based on audio and video singlemode transcription,and a network fusion transcription system based on CNN.These two systems have their own advantages in different application scenarios.Logic fusion is more suitable for rapid system construction.Network fusion is suitable for system construction that requires higher accuracy and robustness.The logic fusion system achieved 94.5% F1 value in Play Dataset testing,and the network fusion achieved 96.8%.The systematic experiments shows that the accuracy and robustness of the audio-visual fusion system realized in this thesis are higher than that in the existing piano transcription system,the CNN-based network fusion system has the best transcription effect,which can support piano teaching.

Keywords/Search Tags:

Audio-visual Fusion System, Automatic Music Transcription, Multi-pitch Estimation, Convolutional Neural Network

PDF Full Text Request

Related items

1	Research And Implementation Of A Vision-based Piano Transcription System
2	Research And Implementation Of A CNN-based Piano Music Transcription Algorithm
3	Research On Automatic Transcription Algorithm Of Piano Music Based On CNN-HMM
4	Research On Stereoscopic Reconstruction And Transcription Of Monophonic Music
5	Research And Implementation Of Audio-visual Fusion Piano Transcription Technology
6	Music Genre Recognition Research Based On Improved Deep Convolutional Neural Network
7	Research And Implementation Of A CNN-based Polyphonic Piano Transcription Algorithm
8	Research On Polyphonic Multi-Instrument Recognition Method Based On Dilated Convolutional Recurrent Neural Network
9	Research On Music Recommendation System Based On Deep Learning And Collaborative Filtering
10	Research On Movie Recommendation Algorithm Based On Convolutional Neural Network And Recurrent Neural Network