Font Size: a A A

Research And Implementation Of Audio Alignment Based On Deep Learning

Posted on:2020-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y D LiuFull Text:PDF
GTID:2428330575456531Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the development of information and communication technologies,the prevailing of digital music contributes to the flourish of music-rated research topics.Audio alignment has become a significant subfield in digital music signal processing,and it's also vital for information access and retrieval of music content.In recent years,deep learning technology has also been used for music alignment.In this dissertation,we research on the alignment between audio and MIDI notes in the music data stream.Selecting piano music as the object of alignment,we adopt audio signal processing technology and deep learning method to extract feature sequence from the audio signal,and have implemented two alignment models based on pitch sequence.On this basis,we researched the impact of combining the pitch sequence and the note Onset sequence on the alignment performance and we also analyzed alignment results of three different neural network models after extracting the note Onset sequence.The main works of this paper are as follows:Firstly,we extract the feature parameter of audio signals and MIDI feature sequence.Analyzing the time-frequency domain characteristics of the audio signal,we pre-processed,framed,windowed and filtered the audio signal.The integrated features(360 dimension)are composed of the STFT features of the long and short windows of audio signal.And by analyzing the MIDI file format,we extracted the pitch sequence and the note Onset sequence.Secondly,we work on the alignment based on pitch sequence.The convolutional neural network(CNN)and the bidirectional long-short-time memory network(BLSTM)are individually used to extract the pitch sequence from the audio feature parameters,and the fast dynamic time warping algorithm(FastDTW)is used to calculate the alignment path of the audio pitch sequence and the MIDI pitch sequence.And the alignment results are evaluated within the threshold of 10ms,30ms,50ms,and 100ms,indicating that the alignment effect with CNN model is better.Finally,we research and have implemented both the joint alignment model of the pitch sequence and Onset sequence.Three models for extracting audio note Onset sequences,including bidirectional long-short-time memory network(BLSTM-Attention)with attention mechanism and attention mechanism-based convolutional neural network with bidirectional long-short-time memory network model(CNN-BLSTM-Attention).The alignment results show that the alignment accuracy is greatly increased after adding the note Onset sequence.And among the three models for extracting note Onset sequence,the alignment performance of the CNN-BLSTM-Attention model is the best,and the BLSTM-Attention model is suboptimal.
Keywords/Search Tags:audio alignment, bidirectional long-short-time memory network, attention mechanism, convolutional neural network, dynamic time warping
PDF Full Text Request
Related items