| With the rapid development of computer science,the network has become an important way of obtaining information and knowledge in people’s daily,online data increase rapidly,it is difficult for users to quickly get useful information from the search engines that you need.The automatic question answering system allows people to ask questions in natural language,and directly return the answers,which is convenient and efficient.In question answering system,paraphrase generation technology can rewritten the complex questions presented by the user with the natural language into a series with the same semantics but different types of questions,some of these questions are conforming the structure rule,avoiding the normative of questions asked by users,so it can greatly simplify the process of understanding and reducing the difficulty of questions,it has an important significance to improve the effect of automatic question answering system.In present,because we lack high precision Chinese paraphrase questions corpus,we use the "similar problem" in Baidu as the data source,but there are lots of wrong question sentences,so we need to reconstruct the paraphrase corpus for the research.The content of this paper is divided into two parts: the construction method of Chinese paraphrase corpus and the Chinese question paraphrase generation method.The method of paraphrase generation of Chinese question is divided into question paraphrase generation based on template matching and question paraphrase generation based on sequence to sequence.Firstly,we propose a method of constructing Chinese paraphrase corpus by keyword extraction method and similarity computation method.We use the keyword extraction method based on word gravity,if two questions have the same keywords,then they can be considered a paraphrase;We use the CNN model based on similar and different information to calculate the similarity between two questions,if the score is higher than the threshold,they can be considered a paraphrase.Experiment results show that the two methods are effective for improving the accuracy of corpus,and the similarity calculation method is better than keyword extraction.Secondly,we propose a template matching paraphrase generation method based on function words and dependencies.This method uses word segmentation,POS tagging,entity recognition,functional labels to extract the question template s,retaining the specific components for each question,the final templates contain not only sentence structural information,but also semantic and contextual information.At the same time,we simplify the sentence structure by adding dependency analysis,which can improve the compatibility of question templates.After an original question is rewritten,the candidate sentence extraction module is u sed to evaluate it.Experiment results show that the paraphrase generation method based on template matching is more effective than other methods.Lastly,we propose a question paraphrase generation method based on Sequence to Sequence model,it treats the question paraphrase generation task as a Machine Translation task.Compared with the traditional Sequence to Sequence model,we use Bi-LSTM model and Residual-LSTM model to learn the content of a question deeply,using the attention mechanism to get context information on each time of decoding process,which can improve the correlation between the input sequence and sequence.Experiment results have proved the residual LSTM added attention mechanism are effective for paraphrase generation task. |