Font Size: a A A

Capsule Routing Self-attention Network For Neural Machine Translation

Posted on:2021-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:J C CaoFull Text:PDF
GTID:2518306503972179Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Witnessing the impressive results obtained in the field of machine translation,the implementation of attention mechanism and its variants quickly becomes a standard component in neural networks when facing the tasks such as document classification,speech recognition and many other natural language processing(NLP)applications,which help achieve promising performance compared to previous work.However,most of early work only implemented the attention mechanism on a recurrent neural network(RNN)architecture e.g.Long ShortTerm Memory(LSTM)and Gated Recurrent Unit(GRU)which have the problem of lacking the support of parallel computation,making it unpractical to build deep network.In order to address the problem above,Vaswani et al.proposed a novel self-attention network(SAN)architecture empowered by multi-head self-attention,which utilizes different heads to capture partial sentences information by projecting the input sequence into multiple distinct subspaces in parallel.Although they only employed the simple linear transformations on the projection step,the impressive performance of the Transformer network still achieves a great success.Most existing work that focused on the improvement of multi-head attention mechanism mainly try to extract a more informative partial representation on each independent heads.Li et al.proposed aggregating the output representations of multi-head attention.Dou et al.tried to dynamically aggregate information between the output representations from different encoder layers.All these work concentrates mainly on the parts either“before” or “after” the step of multi-head SAN,which,as an important part of the whole Transformer model,should be paid more attention to.To more empower the current Transformer,we thus propose constructing a more general and context-aware SAN,so that the model can learn deeper contextualized information of the input sequence,which could eventually be helpful in improving the model final performance.In this paper,we propose the novel capsule-Transformer,in which we implement a generalized SAN called Capsule Routing Self-Attention Network which extends the linear transformation into a more general capsule routing algorithm by taking SAN as a special case of capsule network.One of the biggest changes from capsule networking mechanism is altering the processing unit from scalar(single neuron)to capsule(group of neurons or vectors).Inspired by the idea of such a capsule processing,we first similarly organize groups of attention weights calculated through self-attention into various capsules containing preliminary linguistic features,then we apply the routing algorithm on these capsules to obtain an output which can contain deeper contextualized information of the sequence.Re-organizing the SAN in a capsule way,we extend the model to a more general form compared to the original SAN.
Keywords/Search Tags:Neural Network, Machine Translation, Attention Mechanism, Capsule Network
PDF Full Text Request
Related items