| Question Answering is a crucial task in Natural Language Processing where machines mimic the human’s ability to answer questions accurately based on given data.Due to the large amounts of training data and the large number of training parameters,the Transformer model has become a useful model for Question Answering.However,it also brings great challenge for the efficiency of training.Meanwhile,there is the limitation of input length for Transformer encoder,which affects the accuracy of Question Answering.To solve these challenges,this thesis presents the novel Transformer for Question Answering.First,an efficient training method for the Transformer is introduced to reduce the storage and training time overhead of the Transformer.A weight classification algorithm is given to analyze the characteristics of weights in each layer of the Transformer and classifies them into two categories.An activation deduction algorithm is also proposed to focus on each layer’s activation and classify them by their characteristics.Based on these classifications,a multi-layer quantization method is proposed to heavily quantize the less important parameters and lightly quantize the important parameters.The WMT2014 DE-EN and WMT2014 FR-EN datasets are used to evaluate,and the results show the proposed method achieves 4x compression compared with the base Transformer.With 400 epochs training,the proposed multi-layer quantization can save 20%and 16% of time overhead compared to Pruning + Quantization and Fixed-Bit Quantization techniques respectively.The efficient training method for the Transformer achieves 27.9 and 39.9accuracies in terms of BLEU scores with DE-EN and FR-EN datasets,and these scores are 30%higher than the direct Tensor Flow quantization,the Fully QT Transformer model and other current low-precision Transformer models.Based on the efficiently compressed model,a new Transformer encoder based on episodic memory is given to reduce the context fragmentation.The episodic memory is used to process the input and convert more information into the Transformer’s encoder.The flow of Question Answering based on the new Transformer is also given.Evaluations are made based on standard benchmarks in Machine Reading Comprehension.The proposed episodic memory-based Transformer model achieves test accuracy of 0.97 and does not run out memory even when 8GB memory is used.However,the prototype with full-self attention or sliding-window cannot run with8 GB memory.Using the SQUAD dataset,the proposed model obtains 57.6 EM and 65.3 F1 scores.These scores represent 19% improvement over the BERTserini model and 1.5% improvement over the Path Retriever model.Further experiments on Hotpot QA dataset also achieves answer supporting indices of 57.19 EM and 86.02 F1 scores,which means that the proposed episodic memory-based Transformer has an improved accuracy of 35.2% in EM and 19.36 % in F1 over the baseline model. |