With the rapid development of economy and the rapid improvement of people’s living standards,people pay more and more attention to the topic of medical and health field.As a sub domain of question and answering(QA)system,retrieval medical QA system can retrieve knowledge base and give professional medical answers according to medical questions raised by users,which has extremely important research and application prospects.The typical retrieval QA system mainly includes two key modules: recall module and sorting module.In the sorting module,deep semantic matching model is used to reorder the recalled data.The performance of deep semantic matching model directly affects the performance of retrieval QA system.Therefore,the research of deep semantic matching algorithm in retrieval QA system is of great significance.At present,the breakthrough in the field of deep semantic matching mainly comes from the improvement and development of pre-training models such as BERT.However,the pre-training model still has many shortcomings,such as many model parameters,slow reasoning speed and so on.Therefore,in order to improve the accuracy and efficiency of deep semantic matching model,and combined with knowledge distillation technology,this paper constructs a lightweight retrieval deep semantic matching model with high performance and faster model reasoning speed.The main contents of this paper are as follows.MBDE deep matching semantic model is constructed.MBDE model is mainly composed of coding layer,time series layer and information extraction layer.The coding layer uses multi-layer Transformer to encode text context information,the time series layer uses Bi LSTM to model location information,and the information extraction layer uses max pooling to extract global text key information.Finally,the accuracy and F1 score of MBDE model are 90.93% and 0.9038 respectively,which are better than that of BERT model.MBDE model has high efficiency,fast reasoning speed and less parameters.The reasoning time is only 28.9% of that of BERT,and the parameters are only 40.1% of that of BERT.A semantic matching knowledge distillation model MBDE-small based on MBDE model is constructed.MBDE model has excellent performance,fast reasoning speed and few parameters,but in order to achieve faster online response time and meet the real-time requirements,it needs to further reduce the model delay and compress the model.Therefore,combined with the knowledge distillation technology in model compression,the medical semantic matching knowledge learned from the embedding layer and output layer of MBDE model is transferred to a lighter distillation model MBDE-small.According to the experiment,MBDE-small model achieved 88.95% and 0.8836 in accuracy and F1 score respectively,which was ahead of the baseline model.Compared with MBDE,the reasoning time is reduced by331 ms,and the model parameters are reduced by 2.5M.The medical QA corpus and retrieval medical QA system were constructed.Based on the open source medical QA data,through a series of preprocessing process and manual extraction process,medical QA corpus is constructed.Then,based on the medical QA corpus,the pre-processing module,intention recognition module,recall module and sorting module are implemented to construct the retrieval medical QA system.For the intention recognition module,the accuracy index reaches 0.951.For the recall module,the MRR index reaches 0.48. |