Research On Audio Sequence Segmentation Method In Complex Scenes

Posted on:2019-05-27

Degree:Master

Type:Thesis

Country:China

Candidate:T T Zhu

Full Text:PDF

GTID:2428330566496742

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The segmentation of audio sequences in complex scenes is the basis and prerequisite for deep processing of audio.It has an important influence on audio follow-up processing.In many practical applications,such as speech recognition system,speaker recognition system,and automatic speech tagging system,the first task is to accurately detect the input speech signal and find the start and end points of the speech segment.At present,many researches on audio segmentation are focused on the development of pure speech signals,but it is impossible to make accurate segmentation for audio containing background noise.For the needs of the research work,an audio sequence data set in the complex scenes was constructed.By preprocessing and standardizing annotation of primary and secondary school teaching audio data under the complex scenes collected,a total of 62.32 hours of speech corpus was constructed,laying the foundation for the study of audio sequence segmentation tasks in complex scenes.For the segmentation of audio sequences in complex scenes,two audio segmentation models were constructed.These are a single model based on deep learning and a hybrid model based on deep learning and Bayesian Information Criterion.The single model uses Res Net as the model structure.Since the acoustic features are presented in the form of a spectral map,taking into account the excellent performance of Res Net in image processing,we introduce Res Net into the speech segmentation task.Experiments were performed on the existing complex scene data sets and clean public data sets.The superiority of Res Net on the task was verified by comparison experiments of three deep learning models and two machine learning models.The hybrid model combines the advantages of the Bi LSTM,Res Net,and BIC to make up for the inadequacies of the single model and make more precise segmentation.We compared the advantages and disadvantages and application scenarios of the single model and the hybrid model.Based on the prediction results of the above two models,an audio sequence automatic segmenting system in a complex scene is provided.

Keywords/Search Tags:

Audio segmentation, complex scenes, convolution neural network, long short-term memory, deep residual network, bayesian information criterion

PDF Full Text Request

Related items

1	Research On Speaker Identification Based On Deep Learning
2	Research And Application Of The Short-term Memory Network For Adjusting Gate Length
3	Chinese Sign Language Recognition Based On Convolutional Network And Long Short Term Memory Network
4	Research On Sensor Activity Recognition Based On Deep Convolution Neural Networks
5	Research On Network Intrusion Detection Method Based On Bi-LSTM
6	Research On Chinese Relation Extraction For Complex Text Structure
7	Research On Abnormal Behavior Identification Based On Long Short-term Memory Neural Network
8	Application Of Deep Learning-based Recurrent Neural Network In Short-term Load Forecasting
9	A Comparative Study Of Deep Neural Network For Mobile Phone Sensor Activity Recognition
10	Design Of A Blind Equalizer Based On Long Short-term Memory Neural Network