Research On Encoder-Decoder Models Based Sequence Mapping Problems

Posted on:2021-02-15

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J F Hou

Full Text:PDF

GTID:1368330602494255

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In sequence modeling and processing problems,it is important to learn how to gen-erate a sequence given another sequence.We call this type of problems as sequence-to-sequence mapping problems(or sequence mapping problems,for short).Both machine translation(MT)and automatic speech recognition(ASR)belong to this kind of prob-lem.Traditional methods divide the whole sequence mapping problem into several subproblems including hand-crafted features,alignments between sequences,external linguistic knowledge,and so on.These subtasks are modeled individually and then combined to produce target sequences.With the booming of deep learning,models of the subtasks were replaced with neural networks so that models like translation model,acoustic model and language model can benefit from the powerful modeling capabilities of deep learning.However,it is complicated to build and deploy systems with the tradi-tional approaches because much amount of expert knowledge is needed.Meanwhile,er-rors of each submodule are accumulated for the final combined model since submodules of the system are difficult to be jointly optimized.To address these issues,end-to-end methods are proposed recently and become very popular.As an end-to-end approach,encoder-decoder model converts an input sequence to an output sequence directly with-out any intermediate intervention.Therefore encoder-decoder models,also known as sequence-to-sequence models,have been widely applied to sequence-mapping tasks like machine translation and speech recognition.And they can yield comparable or even better performance than traditional methods.Although encoder-decoder models are attractive,problems like training efficiency and real-time recognition still need to be explored.Meanwhile,more suitable sequence-to-sequence architectures for spe-cific tasks are worth exploring.Therefore,in this thesis,we come up with several new encoder-decoder models to solve the sequence mapping problems of machine transla-tion and speech recognition.Firstly,recurrent neural networks(RNN)are commonly adopted as the basic com-ponents of encoder-decoder models,which brings in the temporal dependency restric-tion.As a consequence,it is time-consuming to train the model since items in a sequence can not be processed parallelly.In response to this,we present a sequence-to-sequence model by replacing the RNNs with feedforward sequential memory networks(FSMN)in both encoder and decoder,which enables the new architecture to encode the entire source sentence simultaneously.We also modify the attention module to make the de-coder generate outputs simultaneously during training.We achieve comparable results in machine translation task with about 2 times faster during training because of temporal independency in FSMN based encoder and decoder.Secondly,the attention mechanisms of conventional encoder-decoder models it-eratively perform a pass over the entire input sequence to get attention weight vector and generate output sequence,which makes them fail in streaming processing tasks like online speech recognition where output symbols are produced when the input sequence has only been partially observed.In response to this,we propose the gaussian predic-tion based attention and segment boundary detection directed attention,to enable the online recognition of encoder-decoder model.For the first attention mechanism,the alignment between output and input sequence is assumed to be a time-moving gaus-sian window with variable size,and the location and size of the window are decided by its mean and variance.At each attention step,the window's moving forward increment along time from the previous window center as well as the variance are predicted so that online speech recognition is realized.For the second attention mechanism,to utilize the segmental structure of speech,we propose a segment boundary detection directed atten-tion mechanism which splits the input speech into successive segments with detected boundaries so that different output symbols adaptively have different chunk sizes for aggregating information within the segment with soft attention.The experimental re-sults show that this attention mechanism achieves comparable online performance with state-of-the-art models.The segment boundary detection is formulated as a sequential decision making problem and is solved with RL algorithm,which validates the effec-tiveness of using reinforcement learning for speech recognition task.Finally,commonly used encoder-decoder models don't fully exploit the monotonic input-output relation in ASR and the short-term stationary of speech.And the diffused weighted sum of the input(in soft-attention)and the absence of accessing all possible alignment paths make the classification of aligned input statistically less interpretable.In response to this,we propose a sequence-to-sequence ASR model equipped with se-quential state modeling,which resembles the concept of state transitions in HMM.Our method explicitly models output state transition probability and output emission prob-ability,so that more flexible and interpretable monotonic alignment distribution can be derived.Experiments on TIMIT dataset show promising recognition performance.And the generated alignments and emission probabilities demonstrate the stepwise input-output mapping property of the proposed model.

Keywords/Search Tags:

Encoder-Decoder Model, Neural Machine Translation, Non-Recurrent Structure, Online Speech Recognition, Reinforcement Learning, Sequential State Modeling

PDF Full Text Request

Related items

1	Research And Optimize On Encoder Decoder In End-to-End Speech Recognition
2	Research On Neural Machine Translation Based On Re-decoding
3	Research In End-to-end Automatic Speech Recognition Technology
4	Research On Reinforcement Learning Based On Hidden Space Modeling
5	Research On Encoder-Decoder Model For Complex Structure Text Recognition
6	Algorithm Design And Optimization Of Recurrent Neural Network Training On GPU Platform
7	Research On End-to-end Neural Network Machine Translation
8	Reinforcement Learning-Based Neural Machine Translation Models
9	Research On End-To-End Speech Translation
10	Multi-source Information Enhanced End-to-end Neural Machine Translation