Font Size: a A A

Research On Optimization Of Deep Learning Model For Acoustic Signal Processing

Posted on:2019-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:J LeiFull Text:PDF
GTID:2428330611493631Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the main information carrier in the process of human production activities,acoustic signals have been receiving much attention and research.Entering the Internet of Things era,making the machine better serve the human society has become a hot topic.The human-computer interaction through acoustic signals has become a hot research topic.With the rapid development of computers and artificial intelligence,deep learning-based methods have become the mainstream research method for acoustic signal processing.The acoustic signals received by the machine are mainly from the human voice and the surrounding environment sound.At present,related research on acoustics mainly focuses on tasks such as automatic speech recognition,phoneme recognition and acoustic scene classification.This paper studies the acoustic scene classification and speech phoneme recognition tasks,and discusses some problems in the process of human-computer interaction between acoustic signals:Aiming at the problem of acoustic scene classification,this paper proposes a hybrid neural network model with highly aggregated time-frequency domain acoustic features.We observe that the existing model has the following problems in processing the audio time domain characteristics and frequency domain characteristics: 1)The single model structure only learns the time domain characteristics or frequency domain characteristics of the audio;2)The hybrid model structure loses or destroys the original timing information of the audio;3)The hybrid model structure does not utilize the audio time domain and frequency domain information,and cannot perform the optimal performance of the hybrid model.Based on the above observations and analysis,this paper designs an LCNN network structure to effectively avoid the loss of audio original timing information.In addition,this paper proposes a time-enhanced multi-channel feature fusion mechanism(MCFF)to use time-frequency domain characteristics more effectively for the hybrid model.Finally,based on the above two innovation mechanisms,a new hybrid model,Multi-LCNN model,is proposed.And the classification accuracy of acoustic scenes classification is improved by our hybrid model.Aiming at the problem of speech phoneme recognition,this paper proposes a multiobjective learning sequence-convolution neural network model(SeqCNN).According to the form of speech phoneme in the audio signal,the phoneme recognition model can be divided into the following three categories: 1)Frame-based model.The frame contains too little phoneme information,and the frame similarity near the connected phonemes is too high;2)the phoneme-based model.It relies on additional phoneme start and stop time information;3)a sequence-based model.It has insufficient learning ability for the phonemes of weak semantic information,and it cannot describe the start and end time of the phonemes.To solve these problems,we designed a sequence-convolution neural network structure that can process sequence data and use convolutional networks to learn the strong representation of phonemes.At the same time,we propose a multi-objective classifier with weight sharing and its loss function.Finally,we propose a sequence-convolution neural network model,SeqCNN model,which comprehensively solves the problem of speech phoneme recognition and improves the accuracy of phoneme recognition.
Keywords/Search Tags:Deep Learning, Acoustic Signal Processing, Acoustic Scenes Classification, Phoneme Recognition
PDF Full Text Request
Related items