Font Size: a A A

Research On Audio Sequence Segmentation Method In Complex Scenes

Posted on:2019-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:T T ZhuFull Text:PDF
GTID:2428330566496742Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The segmentation of audio sequences in complex scenes is the basis and prerequisite for deep processing of audio.It has an important influence on audio follow-up processing.In many practical applications,such as speech recognition system,speaker recognition system,and automatic speech tagging system,the first task is to accurately detect the input speech signal and find the start and end points of the speech segment.At present,many researches on audio segmentation are focused on the development of pure speech signals,but it is impossible to make accurate segmentation for audio containing background noise.For the needs of the research work,an audio sequence data set in the complex scenes was constructed.By preprocessing and standardizing annotation of primary and secondary school teaching audio data under the complex scenes collected,a total of 62.32 hours of speech corpus was constructed,laying the foundation for the study of audio sequence segmentation tasks in complex scenes.For the segmentation of audio sequences in complex scenes,two audio segmentation models were constructed.These are a single model based on deep learning and a hybrid model based on deep learning and Bayesian Information Criterion.The single model uses Res Net as the model structure.Since the acoustic features are presented in the form of a spectral map,taking into account the excellent performance of Res Net in image processing,we introduce Res Net into the speech segmentation task.Experiments were performed on the existing complex scene data sets and clean public data sets.The superiority of Res Net on the task was verified by comparison experiments of three deep learning models and two machine learning models.The hybrid model combines the advantages of the Bi LSTM,Res Net,and BIC to make up for the inadequacies of the single model and make more precise segmentation.We compared the advantages and disadvantages and application scenarios of the single model and the hybrid model.Based on the prediction results of the above two models,an audio sequence automatic segmenting system in a complex scene is provided.
Keywords/Search Tags:Audio segmentation, complex scenes, convolution neural network, long short-term memory, deep residual network, bayesian information criterion
PDF Full Text Request
Related items