| Retrieval based Q&A system is a common used Q&A system in the industry which answers questions by searching the best match one in the database.When user raises a new question,retrieval based Q&A search the answer that best matches the question from it’s database and return it to user.The performance of this type of Q&A system is often affected by the way that user question is expressed.Due to the diversity of human language,there may be many different expression ways for different users.Therefore,question analysis and retelling are always the key steps.Simulating user’s question in real scene is very crucial for bridging the gap between the user’s question and the system reasoning and improving the accuracy and recall rate of the system.On the premise of ensuring the semantic consistency between paraphrased questions and given question,generating diverse expressions with rich information can help improve the accuracy and recall rate of the retrieval system,which is also the key research direction of this topic.Based on variational auto-encoder,this paper studies the following two issues.For diversity paraphrase generation,we add multi-head attention to the variational auto-encoder,which explores the combination and transformation of hidden status in high-dimension to provide the soil of diversified generation.For the semantic drift phenomenon caused by the losing of keywords,a multi-task framework for joint training of paraphrased question generation task and the keyword extraction task is proposed to provide advanced decoding generation semantic constraints.The main research work is as follows:Firstly,for questions diversity paraphrase generation,this paper introduces a multi-head attention mechanism based on variational auto-encoder,which can conducts high-dimensional exploration and combination of the hidden contexts from the sample’s distribution to obtain more information.Variational auto-encoder encodes the posterior distribution based on input and then samples from the generated distribution to obtain semantic modeling for the input samples,which introduces uncertainty for each decoding generation round.So as to obtain diversity and equal-quality paraphrased questions,which overcomes the problem of repeated generation,heavy parameters and huge amount of calculation which exists in previous neural network.Experiments have proved that the proposed MH-VAE achieves improvements over strong baselines in terms of diversity generation.Secondly,in response to the problem of semantic drift caused by losing keywords during question paraphrasing,an end-to-end paraphrase generation model named MH-VAE-KE is designed,which joint trains keyword extraction task and paraphrase generation task.By adding the keyword extraction task to the encoder of VAE,it can generate an implicit semantic restricts to the encoding space,thereby providing high-level semantic constraints for decoding process.Experiments have proved that the proposed multi-task model MH-VAE-KE has achieved advanced results in both quality and diversity of paraphrase generation.Thirdly,we design a retrieval based Q&A system based on the WikiQA dataset to evaluate the effectiveness of the proposed paraphrase generative model MH-VAE-KE.We set the experiment group and the control group,and then uses the mean reciprocal rank(MRR)and the mean average precision(MAP)of retrieved results to evaluate the effectiveness of MH-VAE-KE on the two groups.The improvements on MRR and MAP show that the results generated by MH-VAE-KE can effectively paraphrase original question that those paraphrased questions can help bridge the gap between user questions and system inferences. |