Font Size: a A A

Low-resource Speech Representation Learning And Its Applications

Posted on:2019-11-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:H J ChenFull Text:PDF
GTID:1368330623453344Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Speech representation learning of low-resource languages is one of the basic tasks in computer speech and language processing,which aims to learn representations including models and features that are expressive of the linguistic units in the speech data.This task plays important roles in both scientific and engineering area.On one hand,it can provide and validate some computational models in the cognitive science society such as statistical learning of language acquisition.On the other hand,this research topic can provide basic technologies for many applications such as automatic speech recognition of low-resource languages.Under the low-resource condition,this dissertation proposes several speech representation learning methods using two strategies,namely mining of unsupervised linguistic structure and transferring of cross-lingual phoneme information.With the proposed speech representations,this thesis studied their applications in query-by-example spoken term detection and topic segmentation on spoken documents.The main contributions of this dissertation are as follows:(1)Dirichlet process Gaussian mixture model(DPGMM)based phoneme-like unit clustering and posteriorgrams extraction are proposed in this paper.To minimise human anticipation in the feature learning process,a Bayesian non-parametric model is employed to mine the basic phoneme-like structure in the speech.Since the inference of Bayesian non-parametric models suffers slow convergences speed,we employ DPGMM,a shallow Bayesian non-parametric model,and its split/merge based MetropolisHastings to cluster speech frames.The clusters are regarded as phoneme-like units and derive posteriorgrams as speech representations.In the challenge of Zero Speech2015,our method obtains the best-performing feature representation in the phoneme discriminability test.(2)An approach to learn unsupervised bottle-neck(Bottle-neck,BN)features based on DPGMM is proposed.Noticed that posteriorgrams are so high-dimension that they are not suitable for computationally intensive downstream applications.Meanwhile deep neural network(DNN)has a strong capability of feature learning.Unsupervised BN features,that is expressive of the speech content,is proposed by combining DPGMM and DNN.Without manual transcription,our feature representations performs comparable to supervised cross-lingual BN features in query-by-example spoken term detection(Qb ESTD).(3)An approach to learn multilingual BN features from untranscribed speech is proposed.A framework based on DPMM and multi-task learning(MTL)DNN is proposed to extract BN features that is shared by multiple low-resource languages.Our BN features show good capability of characterising the basic linguistic structure in speech and performs the best in the phoneme discrimination test of Zero Speech Challenge 2017.(4)An MTL–DNN based framework which combines unsupervised phoneme-like unit information and resource-rich cross-lingual phonetic information is proposed to learn speech features.In this framework,the learned BN features shows better capability of revealing the basic linguistic structure and content in the speech data.We perform detailed analysis of the proposed features via the Qb E-STD task,phonemediscrimination test and visualisation.(5)An acoustic self-validated normalised cut approach to segment topics on spoken documents is proposed.This approach does not require manual transcriptions and can validate the number of topics in spoken documents automatically.Meanwhile,this work also shows the promise of the proposed speech representations in the real-application.
Keywords/Search Tags:Speech representation learning, Feature learning, Dirichlet Gaussian mixture model, Multi-task learning, Deep learning, Unsupervised learning, Query-by-example spoken term detection, Topic/Story segmentation
PDF Full Text Request
Related items