Font Size: a A A

Identification Of Spoken Language From FM Broadcast Using Deep Learning

Posted on:2020-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:D ZhuFull Text:PDF
GTID:2428330575989331Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of social economy and the acceleration of globalization,the mobility of people around the world has increased the opportunities for people with different linguistic backgrounds to communicate with each other.Automatic language recognition,as the first step in speech recognition,is very important.The rapid development of global artificial intelligence has also promoted the upgrading of technologies.As a bridge technology of human information exchange,voice technology has attracted more and more researchers to work in the realization of good voice interaction.And the security of radio communications in border areas can also be monitored by means of speech recognition;It is great importance to achieve fast and precise language recognition for all subsequent work related to speech recognition.This paper focuses on the recognition of the phonetic language of the broadcast,and discusses the language recognition method in detail.The main research can be summarized as:1)According to the requirement of data set in the field of language recognition,data sets of Lao,Putonghua,Burmese,Thai and Vietnamese for about 25 hours were collected,and the reliability of data was confirmed by comparing with other data sets.2)Combined with the method of voice processing,the broadcast signal identification data set is established,and the signal/non-signal identification of the FM broadcast signal are analyzed by deep learning.3)A reliable baseline system for language recognition is established by using I-Vector method,which provides a reliable theoretical basis for the improvement of subsequent experiments.4)Based on the deep neural network,two end-to-end language recognition methods using acoustic features as input are designed for short-time variable speech signals.One is language recognition based on Gated Recurrent Unit(GRU).In this paper,the network structure with different parameters and the performance of different acoustic characteristics in three data sets are analyzed,the appropriate network parameters and structures are determined,and the characteristics suitable for the use of the deep learning network are found out.Another model that combines self-attention and deep convolutional neural networks(DCNN)to analyze the use of variable length speech language recognition.The paper compares the difference between the traditional acoustic feature model and the end-to-end model.The results show that the end-to-end method can achieve better recognition results than using I-Vector.
Keywords/Search Tags:signal detection, I-Vector, deep convolutional neural networks, self-attention, language identification
PDF Full Text Request
Related items