| The key to build a high-quality QA system is to build high-quality Q & A pairs.In the construction of a QA system in the medical field,obtaining Q & A pairs through web crawlers is difficult to ensure the accuracy of knowledge,and it is difficult to ensure the construction efficiency by manually constructing Q & A pairs.An electronic medical record is a semistructured document that contains a large amount of knowledge.By analyzing the structure of the medical record,a part of Q & A pairs can be constructed.For unstructured texts containing a large number of sentences,how to extract Q & A pairs from these texts is the research of the thesis.The focus of the thesis is how to extract Q & A pairs from these texts.Therefore,the thesis puts forward a method based on EMR to construct the Q & A pairs in the medical field,and applies this method to the EMR of heart disease.The algorithm of Q & A pair extraction proposed in the thesis consists of two parts: the algorithm of answer sentence extraction and the algorithm of question generation.In the algorithm of answer sentence extraction based on feature selection,the thesis regards the selection of answer sentence as the process of short text classification,and extracts the answer sentence from the set of statements in the electronic medical record of heart disease.In order to improve the description ability of short text,the thesis proposes multi-level feature selection and expansion by combining information gain,improving similarity calculation formula,and introducing Apriori data mining algorithm.The algorithm extracts feature from statement set and EMR respectively.In the algorithm of question generation based on deep learning,the thesis uses dependency parsing and named entity recognition algorithm based on Bi LSTM-CRF neural network to generate disease type question answering pairs.In the question generation algorithm based on template matching,the thesis constructs question template manually,uses the classification algorithm based on Text CNN,integrates the structural information of the heart disease EMR in the embedding layer,and classifies the answer sentence into the corresponding template.In addition,the thesis evaluates the Q & A pairs from the perspective of relevance,as a reference to evaluate the quality of Q & A pairs.Finally,the thesis applies the above algorithms to the heart disease EMR,and extracts the Q & A pairs with professional knowledge and good matching degree from the EMR,which is conducive to the construction of high-quality QA system in the future. |