Research And Implementation Of Audio Alignment Based On Deep Learning

Posted on:2020-04-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y D Liu

Full Text:PDF

GTID:2428330575456531

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

With the development of information and communication technologies,the prevailing of digital music contributes to the flourish of music-rated research topics.Audio alignment has become a significant subfield in digital music signal processing,and it's also vital for information access and retrieval of music content.In recent years,deep learning technology has also been used for music alignment.In this dissertation,we research on the alignment between audio and MIDI notes in the music data stream.Selecting piano music as the object of alignment,we adopt audio signal processing technology and deep learning method to extract feature sequence from the audio signal,and have implemented two alignment models based on pitch sequence.On this basis,we researched the impact of combining the pitch sequence and the note Onset sequence on the alignment performance and we also analyzed alignment results of three different neural network models after extracting the note Onset sequence.The main works of this paper are as follows:Firstly,we extract the feature parameter of audio signals and MIDI feature sequence.Analyzing the time-frequency domain characteristics of the audio signal,we pre-processed,framed,windowed and filtered the audio signal.The integrated features(360 dimension)are composed of the STFT features of the long and short windows of audio signal.And by analyzing the MIDI file format,we extracted the pitch sequence and the note Onset sequence.Secondly,we work on the alignment based on pitch sequence.The convolutional neural network(CNN)and the bidirectional long-short-time memory network(BLSTM)are individually used to extract the pitch sequence from the audio feature parameters,and the fast dynamic time warping algorithm(FastDTW)is used to calculate the alignment path of the audio pitch sequence and the MIDI pitch sequence.And the alignment results are evaluated within the threshold of 10ms,30ms,50ms,and 100ms,indicating that the alignment effect with CNN model is better.Finally,we research and have implemented both the joint alignment model of the pitch sequence and Onset sequence.Three models for extracting audio note Onset sequences,including bidirectional long-short-time memory network(BLSTM-Attention)with attention mechanism and attention mechanism-based convolutional neural network with bidirectional long-short-time memory network model(CNN-BLSTM-Attention).The alignment results show that the alignment accuracy is greatly increased after adding the note Onset sequence.And among the three models for extracting note Onset sequence,the alignment performance of the CNN-BLSTM-Attention model is the best,and the BLSTM-Attention model is suboptimal.

Keywords/Search Tags:

audio alignment, bidirectional long-short-time memory network, attention mechanism, convolutional neural network, dynamic time warping

PDF Full Text Request

Related items

1	Text Classification Research Based On Deep Neural Network And Attention Mechanism
2	Research On Abnormal Behavior Identification Based On Long Short-term Memory Neural Network
3	Research On Network Intrusion Detection Method Based On Bi-LSTM
4	Research On The Violent Detection Of Audio And Video Based On Attention Mechanism
5	Research On CNN-BiLSTM Stock Price Forecast Model And Quantitative Trading Strategy Based On Attention Mechanism
6	Research On The Stance Detection In Social Network Text Based On Deep Learning
7	Design Of Online Shopping Commodity Evaluation System Based On Deep Semantics
8	Research On Image Captioning Generation Based On Faster R-CNN And Visual Attention
9	The Cross-site Script Detection Based On Deep Learning
10	Speaker Emotional State Recognition Based On Speech And Text Fusion