Font Size: a A A

Research And Application Of Main Melody Extraction Algorithm

Posted on:2022-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:W Q JinFull Text:PDF
GTID:2518306497471524Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The extraction of the main melody is a hot and difficult task in the field of music information retrieval.As an element of music,melody plays an important role in understanding music semantics and distinguishing different musical work.The object of main melody extraction is multi-tone music,and its main task is to generate a fundamental frequency sequence corresponding to the height of the song in the music signal.This sequence is the melody line.Most people can distinguish the singing sound or the main instrument sound without being affected by the accompaniment,and even memorize the melody and hum it.However,it is difficult for computers to recognize the main melody in multi-source music.In this thesis,the main melody extraction task is researched,and two main melody extraction algorithms are proposed,one of which is based on CNN-CRF and harmonic enhancement and the other is based on top-level feedback and joint detection.The main work and innovations of this article are as follows:(1)Main melody extraction algorithm based on CNN-CRF and harmonic enhancementFor the existing methods,due to the harmonic characteristics of the music signal,it is easy to cause octave errors,and the interaction of different sound sources in polyphonic music causes the discontinuity of the main melody pitch sequence,which reduces the accuracy of the original pitch of the melody.This thesis proposes a model to enhance the salient features of pitch.Firstly,on the basis of the SF-NMF model,it is proposed to use enhanced harmonic information to enhance the saliency characteristics of pitch;Secondly,the harmonic-CQT transform with harmonic spatial information is introduced to replace the CQT transform to compensate for the CQT Harmonic information of the odd-numbered multipliers missing in the system;Thirdly,CNN is used to learn and enhance the salient features,and CRF is used to learn the local features of pitch globally,and the Viterbi algorithm is used to select the best melody line output.The experiment proves that the algorithm in this thesis has a higher pitch accuracy rate than other algorithms,and it also verifies that the rich structured harmonic information can strengthen the saliency expression and make up for the misjudgment of pitch by SF-NMF.The innovations of this part are: Firstly,use the source separation model as pre-training to reduce the difficulty of extracting the main melody;Secondly,The model strengthens the distinctive features of pitch by learning enhanced harmonic information;Thirdly,The use of CRF to learn pitch Smooth constraints and saliency to track the best melody line.(2)Main melody extraction algorithm based on top-level feedback and joint detectionAiming at solving the problem of high false alarm rate of human voice in the CNN-CRF model,which leads to a decrease in overall accuracy,this thesis proposes a model to enhance singing voice detection.Firstly,through the establishment of the main melody extraction network and the singing voice detection network,the singing voice recognition ability in the main melody extraction task is improved under the joint detection framework;Secondly,this thesis adds a two-way feedback module at the top of the network: one is to add the non-melody information in the melody feature to The auxiliary network is used to enhance the performance of singing voice detection;the second is to use the results of singing voice detection to enhance or weaken the salience of melody pitch.Experiments have proved the effectiveness of joint detection to enhance the detection of melody singing,and the effect of top-level feedback on enhancing the ability of pitch recognition.The innovations in this part are:Firstly,This thesis considers the abstract level difference between speech detection and pitch classification,and adopts a joint detection scheme to solve the dual-objective problem of speech detection and pitch classification;Secondly,Add a two-way feedback module at the top of the network to strengthen The connection between the main and auxiliary networks and the differences between learning different characteristics,through the transmission of their non-singing information between the main and auxiliary networks,to strengthen their respective tasks.(3)Music fountain action simulation based on the main melody extraction algorithmMost of the existing music fountain systems require separate design of water types for different music,which requires a lot of manual operation,which is time-consuming and labor-intensive.In response to this problem,this thesis designs the dynamic water-type action of the musical fountain based on the main melody extraction technology,and achieves diversified water-type effects with a small amount of basic water types.On the one hand,according to different music styles,some basic water types are designed.Among them,this thesis uses the method of feature fusion to achieve a high music classification effect by using models trained in huge data sets;on the other hand,uses the main melody features to design action strategies To dynamically change the water type,and realize the motion simulation of the music fountain based on Open GL and QT.In summary,this thesis proposes two improved models for the problems in the main melody extraction task.The experimental results show that the method proposed in this thesis has better performance.Finally,the dynamic water movement of the musical fountain is designed based on the main melody extraction technology.
Keywords/Search Tags:Main Melody Extraction, Deep Learning, Pitch Saliency Enhancement, Joint Detection, Musical Fountain
PDF Full Text Request
Related items